Global ETD Search

1	Comparison of Sampling-Based Algorithms for Multisensor Distributed Target Tracking Nguyen, Trang 16 May 2003 (has links) Nonlinear filtering is certainly very important in estimation since most real-world problems are nonlinear. Recently a considerable progress in the nonlinear filtering theory has been made in the area of the sampling-based methods, including both random (Monte Carlo) and deterministic (quasi-Monte Carlo) sampling, and their combination. This work considers the problem of tracking a maneuvering target in a multisensor environment. A novel scheme for distributed tracking is employed that utilizes a nonlinear target model and estimates from local (sensor-based) estimators. The resulting estimation problem is highly nonlinear and thus quite challenging. In order to evaluate the performance capabilities of the architecture considered, advanced sampling-based nonlinear filters are implemented: particle filter (PF), unscented Kalman filter (UKF), and unscented particle filter (UPF). Results from extensive Monte Carlo simulations using different configurations of these algorithms are obtained to compare their effectiveness for solving the distributed target tracking problem. statistical inference
2	Data contamination versus model deviation Fonseca, Viviane Grunert da January 1999 (has links) No description available. 519.5
3	Approximation methods and inference for stochastic biochemical kinetics Schnoerr, David Benjamin January 2016 (has links) Recent experiments have shown the fundamental role that random fluctuations play in many chemical systems in living cells, such as gene regulatory networks. Mathematical models are thus indispensable to describe such systems and to extract relevant biological information from experimental data. Recent decades have seen a considerable amount of modelling effort devoted to this task. However, current methodologies still present outstanding mathematical and computational hurdles. In particular, models which retain the discrete nature of particle numbers incur necessarily severe computational overheads, greatly complicating the tasks of characterising statistically the noise in cells and inferring parameters from data. In this thesis we study analytical approximations and inference methods for stochastic reaction dynamics. The chemical master equation is the accepted description of stochastic chemical reaction networks whenever spatial effects can be ignored. Unfortunately, for most systems no analytic solutions are known and stochastic simulations are computationally expensive, making analytic approximations appealing alternatives. In the case where spatial effects cannot be ignored, such systems are typically modelled by means of stochastic reaction-diffusion processes. As in the non-spatial case an analytic treatment is rarely possible and simulations quickly become infeasible. In particular, the calibration of models to data constitutes a fundamental unsolved problem. In the first part of this thesis we study two approximation methods of the chemical master equation; the chemical Langevin equation and moment closure approximations. The chemical Langevin equation approximates the discrete-valued process described by the chemical master equation by a continuous diffusion process. Despite being frequently used in the literature, it remains unclear how the boundary conditions behave under this transition from discrete to continuous variables. We show that this boundary problem results in the chemical Langevin equation being mathematically ill-defined if defined in real space due to the occurrence of square roots of negative expressions. We show that this problem can be avoided by extending the state space from real to complex variables. We prove that this approach gives rise to real-valued moments and thus admits a probabilistic interpretation. Numerical examples demonstrate better accuracy of the developed complex chemical Langevin equation than various real-valued implementations proposed in the literature. Moment closure approximations aim at directly approximating the moments of a process, rather then its distribution. The chemical master equation gives rise to an infinite system of ordinary differential equations for the moments of a process. Moment closure approximations close this infinite hierarchy of equations by expressing moments above a certain order in terms of lower order moments. This is an ad hoc approximation without any systematic justification, and the question arises if the resulting equations always lead to physically meaningful results. We find that this is indeed not always the case. Rather, moment closure approximations may give rise to diverging time trajectories or otherwise unphysical behaviour, such as negative mean values or unphysical oscillations. They thus fail to admit a probabilistic interpretation in these cases, and care is needed when using them to not draw wrong conclusions. In the second part of this work we consider systems where spatial effects have to be taken into account. In general, such stochastic reaction-diffusion processes are only defined in an algorithmic sense without any analytic description, and it is hence not even conceptually clear how to define likelihoods for experimental data for such processes. Calibration of such models to experimental data thus constitutes a highly non-trivial task. We derive here a novel inference method by establishing a basic relationship between stochastic reaction-diffusion processes and spatio-temporal Cox processes, two classes of models that were considered to be distinct to each other to this date. This novel connection naturally allows to compute approximate likelihoods and thus to perform inference tasks for stochastic reaction-diffusion processes. The accuracy and efficiency of this approach is demonstrated by means of several examples. Overall, this thesis advances the state of the art of modelling methods for stochastic reaction systems. It advances the understanding of several existing methods by elucidating fundamental limitations of these methods, and several novel approximation and inference methods are developed. 519.2
4	Estimation and the Stress-Strength Model Brownstein, Naomi 01 January 2007 (has links) The paper considers statistical inference for R = P(X < Y) in the case when both X and Y have generalized gamma distributions. The maximum likelihood estimators for R are developed in the case when either all three parameters of the generalized gamma distributions are unknown or when the shape parameters are known. In addition, objective Bayes estimators based on non informative priors are constructed when the shape parameters are known. Finally, the uniform minimum variance unbiased estimators (UMVUE) are derived in the case when only the scale parameters are unknown. Mathematics
5	BALLWORLD: A FRAMEWORK FOR LEARNING STATISTICAL INFERENCE AND STREAM PROCESSING Ravali, Yeluri January 2017 (has links) No description available. Computer Engineering Statistical Inference, Stream Processing
6	Statistical Learning for Sequential Unstructured Data Xu, Jingbin 30 July 2024 (has links) Unstructured data, which cannot be organized into predefined structures, such as texts, human behavior status, and system logs, often presented in a sequential format with inherent dependencies. Probabilistic model are commonly used to capture these dependencies in the data generation process through latent parameters and can naturally extend into hierarchical forms. However, these models rely on the correct specification of assumptions about the sequential data generation process, which often limits their scalable learning abilities. The emergence of neural network tools has enabled scalable learning for high-dimensional sequential data. From an algorithmic perspective, efforts are directed towards reducing dimensionality and representing unstructured data units as dense vectors in low-dimensional spaces, learned from unlabeled data, a practice often referred to as numerical embedding. While these representations offer measures of similarity, automated generalizations, and semantic understanding, they frequently lack the statistical foundations required for explicit inference. This dissertation aims to develop statistical inference techniques tailored for the analysis of unstructured sequential data, with their application in the field of transportation safety. The first part of dissertation presents a two-stage method. It adopts numerical embedding to map large-scale unannotated data into numerical vectors. Subsequently, a kernel test using maximum mean discrepancy is employed to detect abnormal segments within a given time period. Theoretical results showed that learning from numerical vectors is equivalent to learning directly through the raw data. A real-world example illustrates how driver mismatched visual behavior occurred during a lane change. The second part of the dissertation introduces a two-sample test for comparing text generation similarity. The hypothesis tested is whether the probabilistic mapping measures that generate textual data are identical for two groups of documents. The proposed test compares the likelihood of text documents, estimated through neural network-based language models under the autoregressive setup. The test statistic is derived from an estimation and inference framework that first approximates data likelihood with an estimation set before performing inference on the remaining part. The theoretical result indicates that the test statistic's asymptotic behavior approximates a normal distribution under mild conditions. Additionally, a multiple data-splitting strategy is utilized, combining p-values into a unified decision to enhance the test's power. The third part of the dissertation develops a method to measure differences in text generation between a benchmark dataset and a comparison dataset, focusing on word-level generation variations. This method uses the sliced-Wasserstein distance to compute the contextual discrepancy score. A resampling method establishes a threshold to screen the scores. Crash report narratives are analyzed to compare crashes involving vehicles equipped with level 2 advanced driver assistance systems and those involving human drivers. / Doctor of Philosophy / Unstructured data, such as texts, human behavior records, and system logs, cannot be neatly organized. This type of data often appears in sequences with natural connections. Traditional methods use models to understand these connections, but these models depend on specific assumptions, which can limit their effectiveness. New tools using neural networks have made it easier to work with large and complex data. These tools help simplify data by turning it into smaller, manageable pieces, a process known as numerical embedding. While this helps in understanding the data better, it often requires a statistical foundation for the proceeding inferential analysis. This dissertation aims to develop statistical inference techniques for analyzing unstructured sequential data, focusing on transportation safety. The first part of the dissertation introduces a two-step method. First, it transforms large-scale unorganized data into numerical vectors. Then, it uses a statistical test to detect unusual patterns over a period. For example, it can identify when a driver's visual behavior doesn't properly aligned with the driving attention demand during lane changes. The second part of the dissertation presents a method to compare the similarity of text generation. It tests whether the way texts are generated is the same for two groups of documents. This method uses neural network-based models to estimate the likelihood of text documents. Theoretical results show that as the more data observed, the distribution of the test statistic will get closer to the desired distribution under certain conditions. Additionally, combining multiple data splits improves the test's power. The third part of the dissertation constructs a score to measure differences in text generation processes, focusing on word-level differences. This score is based on a specific distance measure. To check if the difference is not a false discovery, a screening threshold is established using resampling technique. If the score exceeds the threshold, the difference is considered significant. An application of this method compares crash reports from vehicles with advanced driver assistance systems to those from human-driven vehicles. Statistical Inference Text Mining Neural Networks
7	Inferring hidden features in the Internet Gürsun, Gonca 23 July 2024 (has links) The Internet is a large-scale decentralized system that is composed of thousands of independent networks. In this system, there are two main components, interdomain routing and traffic, that are vital inputs for many tasks such as traffic engineering, security, and business intelligence. However, due to the decentralized structure of the Internet, global knowledge of both interdomain routing and traffic is hard to come by. In this dissertation, we address a set of statistical inference problems with the goal of extending the knowledge of the interdomain-level Internet. In the first part of this dissertation we investigate the relationship between the interdomain topology and an individual network's inference ability. We first frame the questions through abstract analysis of idealized topologies, and then use actual routing measurements and topologies to study the ability of real networks to infer traffic flows. In the second part, we study the ability of networks to identify which paths flow through their network. We first discuss that answering this question is surprisingly hard due to the design of interdomain routing systems where each network can learn only a limited set of routes. Therefore, network operators have to rely on observed traffic. However, observed traffic can only identify that a particular route passes through its network but not that a route does not pass through its network. In order to solve the routing inference problem, we propose a nonparametric inference technique that works quite accurately. The key idea behind our technique is measuring the distances between destinations. In order to accomplish that, we define a metric called Routing State Distance (RSD) to measure distances in terms of routing similarity. Finally, in the third part, we study our new metric, RSD in detail. Using RSD we address an important and difficult problem of characterizing the set of paths between networks. The collection of the paths across networks is a great source to understand important phenomena in the Internet as path selections are driven by the economic and performance considerations of the networks. We show that RSD has a number of appealing properties that can discover these hidden phenomena. Computer science Computer networks Statistical inference
8	Efficient Computation of Probabilities of Events Described by Order Statistics and Application to a Problem of Queues Jones, Lee K., Larson, Richard C., 1943- 05 1900 (has links) Consider a set of N i.i.d. random variables in [0, 1]. When the experimental values of the random variables are arranged in ascending order from smallest to largest, one has the order statistics of the set of random variables. In this note an O(N3) algorithm is developed for computing the probability that the order statistics vector lies in a given rectangle. The new algorithm is then applied to a problem of statistical inference in queues. Illustrative computational results are included.
9	Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects Viola, Paul 01 November 1996 (has links) We have developed a new Bayesian framework for visual object recognition which is based on the insight that images of objects can be modeled as a conjunction of local features. This framework can be used to both derive an object recognition algorithm and an algorithm for learning the features themselves. The overall approach, called complex feature recognition or CFR, is unique for several reasons: it is broadly applicable to a wide range of object types, it makes constructing object models easy, it is capable of identifying either the class or the identity of an object, and it is computationally efficient--requiring time proportional to the size of the image. Instead of a single simple feature such as an edge, CFR uses a large set of complex features that are learned from experience with model objects. The response of a single complex feature contains much more class information than does a single edge. This significantly reduces the number of possible correspondences between the model and the image. In addition, CFR takes advantage of a type of image processing called 'oriented energy'. Oriented energy is used to efficiently pre-process the image to eliminate some of the difficulties associated with changes in lighting and pose. AI MIT Artificial Intelligence statistical inference bayesian vision recognition
10	Essays on Efficiency Analysis Asava-Vallobh, Norabajra 2009 May 1900 (has links) This dissertation consists of four essays which investigate efficiency analysis, especially when non-discretionary inputs exist. A new approach of the multi-stage Data Envelopment Analysis (DEA) for non-discretionary inputs, statistical inference discussions, and applications are provided. In the first essay, I propose a multi-stage DEA model to address the non-discretionary input issue, and provide a simulation analysis that illustrates the implementation and potential advantages of the new approach relative to the leading existing multi-stage models of non-discretionary inputs, such as Ruggiero's 1998 model and Fried, Lovell, Schmidt, and Yaisawarng's 2002 model. Furthermore, the simulation results also suggest that the constant returns to scale assumption seems to be preferred when observations have similar sizes, but variable returns to scale may be more appropriate when their scales are different. In the second essay, I make comments on Simar and Wilson work of 2007. My simulation evidence shows that traditional statistical inference does not underperform the bootstrap process proposed by Simar and Wilson. Moreover, my results also show that the truncated model recommended by Simar and Wilson does not outperform the tobit model in terms of statistical inference. Therefore, the traditional method, t-test, and the tobit model should continue to be considered applicable tools for a multi-stage DEA model with non-discretionary inputs, despite contrary claims by Simar and Wilson. The third essay raises an example of applying my new approach to data from Texas school districts. The results suggest that a lagged variable (e.g. students' performance in the previous year), a variable which has been used in the literature, may not play an important role in determining efficiency scores. This implies that one may not need access to panel data on individual scores to study school efficiency. My final essay applies a standard DEA model and the Malmquist productivity index to commercial banks in Thailand in order to compare their efficiency and productivity before and after Thailand?s Financial Sector Master Plan (FSMP) that was implemented in 2004. Efficiency Data envelopment analysis DEA Non-discretionary inputs Statistical inference

Search results