101 |
Estimating the Population Standard Deviation based on the Sample Range for Non-normal DataLi, Yufeng January 2023 (has links)
Recently, an increasing number of researchers have attempted to overcome the constraints of size and scope in individual medical studies by estimating the overall treatment effects based on a combination of studies. A commonly used method is meta-analysis which combines results from multiple studies. The population standard deviation in primary studies is an essential quantitative value which is absent sometimes, especially when the outcome has a skewed distribution. Instead, the sample size and the sample range of the whole dataset is reported. There are several methods to estimate the standard deviation of the data based on the sample range if we assume the data are normally distributed. For example: Tippett Method2, Ramirez and Cox Method3, Hozo et al Method4, Rychtar and Taylor Method5, Mantel Method6, Sokal and Rohlf Method7 as well as Chen and Tyler Method8. Only a few papers provide a solution for estimating the population standard deviation of non-normally distributed data. In this thesis, some other distributions, which are commonly used in clinical studies, will be simulated to estimate the population standard deviation by using the methods mentioned above. The performance and the robustness of those methods for different sample sizes and different distribution parameters will be presented. Also, these methods will be evaluated on real-world datasets. This article will provide guidelines describing which methods perform best with non-normally distributed data. / Thesis / Master of Science (MSc)
|
102 |
Understanding the Robustnessof Self Supervised RepresentationsRodahl Holmgren, Johan January 2023 (has links)
This work investigates the robustness of learned representations of self-supervised learn-ing approaches, focusing on distribution shifts in computer vision. Joint embedding architecture and method-based self-supervised learning approaches have shown advancesin learning representations in a label-free manner and efficient knowledge transfer towardreducing human annotation needs. However, the empirical analysis is majorly limitedto the downstream task’s performance on natural scenes within the distribution. This constraint evaluation does not reflect the detailed comparative performance of learn-ing methods, preventing it from highlighting the limitations of these methods towards systematic improvement. This work evaluates the robustness of self-supervised learn-ing methods on the distribution shift and corrupted dataset ImageNet-C quantitatively and qualitatively. Several self-supervised learning approaches are considered for compre-hensiveness, including contrastive learning, knowledge distillation, mutual information maximization, and clustering. A detailed comparative analysis is presented to under-stand the retention of robustness against the varying severity of induced corruptions and noise present in data. This work provides insights into appropriate method selectionunder different conditions and highlights the limitations for future method development.
|
103 |
Robust Estimation and Prediction in the Presence of Influential Units in SurveysTeng, Yizhen 02 August 2023 (has links)
In surveys, one may face the problem of influential units at the estimation stage.
A unit is said to be influential if its inclusion or exclusion from the sample has a
drastic impact on the estimates. This is a common situation in business surveys
as the distribution of economic variables tends to be highly skewed. We study and
examine some commonly used estimators and predictors of a population total and
propose a robust estimator and predictor based on an adaptive tuning constant. The
proposed tuning constant is based on the concept of conditional bias of a unit, which
is a measure of influence. We present the results of a simulation study that compares
the performance of several estimators and predictors in terms of bias and efficiency.
|
104 |
Probabilistic Methodology for Record Linkage Determining Robustness of WeightsJensen, Krista Peine 20 July 2004 (has links) (PDF)
Record linkage is the process that joins separately recorded pieces of information for a particular individual from one or more sources. To facilitate record linkage, a reliable computer based approach is ideal. In genealogical research computerized record linkage is useful in combing information for an individual across multiple censuses. In creating a computerized method for linking censuse records it needs to be determined if weights calculated from one geographical area, can be used to link records from another geographical area. Research performed by Marcie Francis calculates field weights using census records from 1910 and 1920 for Ascension Parish Louisiana. These weights are re-calculated to take into account population changes of the time period and then used on five data sets from different geographical locations to determine their robustness. HeritageQuest provided indexed census records on four states. They include California, Connecticut, Illinois and Michigan in addition to Louisiana. Because the record size of California was large and we desired at least five data sets for comparison this state was split into two groups based on geographical location. Weights for Louisiana were re-calculated to take into consideration visual basic code modifications for the field "Place of Origin", "Age" and "Location" (enumeration district). The validity of these weights, were a concern due to the low number of known matches present in the data set for Louisiana. Thus, to get a better feel for how weights calculated from a data source with a larger number of known matches present, weights were calculated for Michigan census records. Error rates obtained using weights calculated from the Michigan data set were lower than those obtained using Louisiana weights. In order to determine weight robustness weights for Southern California were also calculated to allow for comparison between two samples. Error rates acquired using Southern California weights were much lower than either of the previously calculated error rates. This led to the decision to calculate weights for each of the data sets and take the average of the weights and use them to link each data set to take into account fluctuations of the population between geographical locations. Error rates obtained when using the averaged weights proved to be robust enough to use in any of the geographical areas sampled. The weights obtained in this project can be used when linking any census records from 1910 and 1920. When linking census records from other decades it is necessary to calculate new weights to account for specific time period fluctuations.
|
105 |
Robust MEWMA-type Control Charts for Monitoring the Covariance Matrix of Multivariate ProcessesXiao, Pei 06 March 2013 (has links)
In multivariate statistical process control it is generally assumed that the process variables follow a multivariate normal distribution with mean vector " and covariance matrix •, but this is rarely satisfied in practice. Some robust control charts have been developed to monitor the mean and variance of univariate processes, or the mean vector " of multivariate processes, but the development of robust multivariate charts for monitoring • has not been adequately addressed. The control charts that are most affected by departures from normality are actually the charts for • not the charts for ". In this article, the robust design of several MEWMA-type control charts for monitoring • is investigated. In particular, the robustness and efficiency of different MEWMA-type control charts are compared for the in-control and out-of-control cases over a variety of multivariate distributions. Additionally, the total extra quadratic loss is proposed to evaluate the overall performance of control charts for multivariate processes. / Ph. D.
|
106 |
On robustness and explainability of deep learningLe, Hieu 06 February 2024 (has links)
There has been tremendous progress in machine learning and specifically deep learning in the last few decades. However, due to some inherent nature of deep neural networks, many questions regarding explainability and robustness still remain open. More specifically, as deep learning models are shown to be brittle against malicious changes, when do the models fail and how can we construct a more robust model against these types of attacks are of high interest. This work tries to answer some of the questions regarding explainability and robustness of deep learning by tackling the problem at four different topics. First, real world datasets often contain noise which can badly impact classification model performance. Furthermore, adversarial noise can be crafted to alter classification results. Geometric multi-resolution analysis (GMRA) is capable of capturing and recovering manifolds while preserving geomtric features. We showed that GMRA can be applied to retrieve low dimension representation, which is more robust to noise and simplify classification models. Secondly, I showed that adversarial defense in the image domain can be partially achieved without knowing the specific attacking method by employing preprocessing model trained with the task of denoising. Next, I tackle the problem of adversarial generation in the text domain within the context of real world applications. I devised a new method of crafting adversarial text by using filtered unlabeled data, which is usually more abundant compared to labeled data. Experimental results showed that the new method created more natural and relevant adversarial texts compared with current state of the art methods. Lastly, I presented my work in referring expression generation aiming at creating a more explainable natural language model. The proposed method decomposes the referring expression generation task into two subtasks and experimental results showed that generated expressions are more comprehensive to human readers. I hope that all the approaches proposed here can help further our understanding of the explainability and robustness deep learning models.
|
107 |
Network coding applications to high bit-rate satellite networksGiambene, G., Muhammad, M., Luong, D.K., Bacco, M., Gotta, A., Celandroni, N., Jaff, Esua K., Susanto, Misfa, Hu, Yim Fun, Pillai, Prashant, Ali, Muhammad, de Cola, T. January 2015 (has links)
No / Satellite networks are expected to support multimedia traffic flows, offering high capacity with QoS guarantees. However, system efficiency is often impaired by packet losses due to erasure channel effects. Reconfigurable and adaptive air interfaces are possible solutions to alleviate some of these issues. On the other hand, network coding is a promising technique to improve satellite network performance. This position paper reports on potential applications of network coding to satellite networks. Surveys and preliminary numerical results are provided on network coding applications to different exemplary satellite scenarios. Specifically, the adoption of Random Linear Network Coding (RLNC) is considered in three cases, namely, multicast transmissions, handover for multihomed aircraft mobile terminals, and multipath TCP-based applications. OSI layers on which the implementation of networking coding would potentially yield benefits are also recommended.
|
108 |
Multivariate Functional Data Analysis and VisualizationQu, Zhuo 11 1900 (has links)
As a branch of statistics, functional data analysis (FDA) studies observations
regarded as curves, surfaces, or other objects evolving over a continuum. Although
one has seen a flourishing of methods and theories on FDA, two issues
are observed. Firstly, the functional data are sampled from common time grids;
secondly, methods developed only for univariate functional data are challenging
to be applied to multivariate functional data. After exploring model-based fitting
for regularly observed multivariate functional data, we explore new visualization
tools, clustering, and multivariate functional depths for irregularly observed
(sparse) multivariate functional data. The four main chapters that comprise the
dissertation are organized as follows. First, median polish for functional multivariate
analysis of variance (FMANOVA) is proposed with the implementation of
multivariate functional depths in Chapter 2. Numerical studies and environmental
datasets are considered to illustrate the robustness of median polish. Second, the
sparse functional boxplot and the intensity sparse functional boxplot, as practical
exploratory tools that make visualization possible for both complete and sparse
functional data, are introduced in Chapter 3. These visualization tools depict
sparseness characteristics in the proportion of sparseness and relative intensity
of fitted sparse points inside the central region, respectively. Third, a robust
distance-based robust two-layer partition (RTLP) clustering of sparse multivariate
functional data is introduced in Chapter 4. The RTLP clustering is based
on our proposed elastic time distance (ETD) specifically for sparse multivariate
functional data. Lastly, the multivariate functional integrated depth and the multivariate
functional extremal depth based on multivariate depths are proposed in
Chapter 5. Global and local formulas for each depth are explored, with theoretical
properties being proved and the finite sample depth estimation for irregularly
observed multivariate functional data being investigated. In addition, the simplified
sparse functional boxplot and simplified intensity sparse functional boxplot for
visualization without data reconstruction are introduced. Together, these four
extensions to multivariate functional data make them more general and of applicational
interest in exploratory multivariate functional data analysis.
|
109 |
ADDRESSING DRIVER CONCERNS: THE NETWORK ROBUSTNESS INDEX APPROACH TO PLANNING CITY CYCLING INFRASTUCTUREBurke, Charles January 2017 (has links)
PhD Thesis / On congested North American urban road networks, driver concerns over increased travel time play a major role in whether or not cycling infrastructure is built. This fact is recognized by transportation planning agencies in Canada and the United States, including the Ministry of Transportation Ontario and the Federal Highway Administration. However, specific frameworks to address such driver concerns do not exist within the practice of urban planning nor the academic literature.
One potentially fruitful avenue is to explore the methods and tools of critical link analysis. One such avenue is provided by the Network Robustness Index (NRI) and the Network Robustness Index Calculator, as this method and tool indexes critical links through traffic simulation from least to most critical. The specific information that can be used to address driver concerns is found in the least critical links as these roadways have additional capacity, and therefore may be considered underutilized.
This thesis explores the use of the NRI as a framework for urban cycling infrastructure planning. Experiments on the utility of the NRI against common traffic and cycling planning tools are explored. The NRI Calculator’s ability to perform full network scans for potential bike lane locations, least cost corridors, and full cycling networks consisting of different designs is tested throughout the chapters of this manuscript. / Thesis / Doctor of Philosophy (PhD) / This thesis aids in the planning of urban bike lanes by addressing driver concerns through traffic simulation.
|
110 |
Criticism and robustification of latent Gaussian modelsCabral, Rafael 28 May 2023 (has links)
Latent Gaussian models (LGMs) are perhaps the most commonly used class of statistical models with broad applications in various fields, including biostatistics, econometrics, and spatial modeling. LGMs assume that a set of unobserved or latent variables follow a Gaussian distribution, commonly used to model spatial and temporal dependence in the data. The availability of computational tools, such as R-INLA, that permit fast and accurate estimation of LGMs has made their use widespread. Nevertheless, it is easy to find datasets that contain inherently non-Gaussian features, such as sudden jumps or spikes, that adversely affect the inferences and predictions made from an LGM. These datasets require more general latent non-Gaussian models (LnGMs) that can automatically handle these non-Gaussian features by assuming more flexible and robust non-Gaussian distributions on the latent variables. However, fast implementation and easy-to-use software are lacking, which prevents LnGMs from becoming widely applicable.
This dissertation aims to tackle these challenges and provide ready-to-use implementations for the R-INLA package. We view scientific learning as an iterative process involving model criticism followed by model improvement and robustification. Thus, the first step is to provide a framework that allows researchers to criticize and check the adequacy of an LGM without fitting the more expensive LnGM. We employ concepts from Bayesian sensitivity analysis to check the influence of the latent Gaussian assumption on the statistical answers and Bayesian predictive checking to check if the fitted LGM can predict important features in the data. In many applications, this procedure will suffice to justify using an LGM. For cases where this check fails, we provide fast and scalable implementations of LnGMs based on variational Bayes and Laplace approximations. The approximation leads to an LGM that downweights extreme events in the latent variables, reducing their impact and leading to more robust inferences. Each step, the first of LGM criticism and the second of LGM robustification, can be executed in R-INLA, requiring only the addition of a few lines of code. This results in a robust workflow that applied researchers can readily use.
|
Page generated in 0.048 seconds