141 |
On the Asymptotic Theory of Permutation StatisticsStrasser, Helmut, Weber, Christian January 1999 (has links) (PDF)
In this paper limit theorems for the conditional distributions of linear test statistics are proved. The assertions are conditioned by the sigma-field of permutation symmetric sets. Limit theorems are proved both for the conditional distributions under the hypothesis of randomness and under general contiguous alternatives with independent but not identically distributed observations. The proofs are based on results on limit theorems for exchangeable random variables by Strasser and Weber. The limit theorems under contiguous alternatives are consequences of an LAN-result for likelihood ratios of symmetrized product measures. The results of the paper have implications for statistical applications. By example it is shown that minimum variance partitions which are defined by observed data (e.g. by LVQ) lead to asymptotically optimal adaptive tests for the k-sample problem. As another application it is shown that conditional k-sample tests which are based on data-driven partitions lead to simple confidence sets which can be used for the simultaneous analysis of linear contrasts. (author's abstract) / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
|
142 |
The Role of Inference in Second Language Reading Comprehension: Developing Inferencing Skill Through Extensive ReadingNiwa, Sayako 02 July 2019 (has links)
The purpose of this study is to determine whether extensive reading has positive effects on developing inferencing skills. Extensive reading is a language learning method of reading large amounts of comprehensible texts. This method limits the use of dictionaries while reading; therefore, extensive readers have greater practice in dealing with unfamiliar words than non-extensive readers. One of the ways to deal with unfamiliar words is to infer the meaning of the word using contextual clues. Knowing how to infer the meaning of unknown words is a helpful skill for language learners. Due to the fact that extensive readers have a greater practice in dealing with unknown words, this study examines whether there are any differences in the precision of inferencing skills between extensive readers and non-extensive readers. There were 39 participants analyzed in this study, 28 non-extensive readers and 11 extensive readers. The results showed that extensive reading has positive effects on language learners’ inferencing skills. In terms of accuracy, we could not see a statistical difference; however, the extensive readers had a higher percentage in accurately inferring the word meaning. In terms of the use of knowledge sources, extensive readers were able to choose the appropriate knowledge source when inferring the target word. These results indicate that extensive reading can enhance language learners’ inferencing skills.
|
143 |
Prediction and Anomaly Detection Techniques for Spatial DataLiu, Xutong 11 June 2013 (has links)
With increasing public sensitivity and concern on environmental issues, huge amounts of spatial data have been collected from location based social network applications to scientific data. This has encouraged formation of large spatial data set and generated considerable interests for identifying novel and meaningful patterns. Allowing correlated observations weakens the usual statistical assumption of independent observations, and complicates the spatial analysis. This research focuses on the construction of efficient and effective approaches for three main mining tasks, including spatial outlier detection, robust inference for spatial dataset, and spatial prediction for large multivariate non-Gaussian data.
spatial outlier analysis, which aims at detecting abnormal objects in spatial contexts, can help extract important knowledge in many applications. There exist the well-known masking and swamping problems in most approaches, which can't still satisfy certain requirements aroused recently. This research focuses on development of spatial outlier detection techniques for three aspects, including spatial numerical outlier detection, spatial categorical outlier detection and identification of the number of spatial numerical outliers.
First, this report introduces Random Walk based approaches to identify spatial numerical outliers. The Bipartite and an Exhaustive Combination weighted graphs are modeled based on spatial and/or non-spatial attributes, and then Random walk techniques are performed on the graphs to compute the relevance among objects. The objects with lower relevance are recognized as outliers. Second, an entropy-based method is proposed to estimate the optimum number of outliers. According to the entropy theory, we expect that, by incrementally removing outliers, the entropy value will decrease sharply, and reach a stable state when all the outliers have been removed. Finally, this research designs several Pair Correlation Function based methods to detect spatial categorical outliers for both single and multiple attribute data. Within them, Pair Correlation Ratio(PCR) is defined and estimated for each pair of categorical combinations based on their co-occurrence frequency at different spatial distances. The observations with the lower PCRs are diagnosed as potential SCOs.
Spatial kriging is a widely used predictive model whose predictive accuracy could be significantly compromised if the observations are contaminated by outliers. Also, due to spatial heterogeneity, observations are often different types. The prediction of multivariate spatial processes plays an important role when there are cross-spatial dependencies between multiple responses. In addition, given the large volume of spatial data, it is computationally challenging. These raise three research topics: 1).robust prediction for spatial data sets; 2).prediction of multivariate spatial observations; and 3). efficient processing for large data sets.
First, increasing the robustness of spatial kriging model can be systematically addressed by integrating heavy tailed distributions. However, it is analytically intractable inference. Here, we presents a novel robust and reduced Rank spatial kriging Model (R$^3$-SKM), which is resilient to the influences of outliers and allows for fast spatial inference. Second, this research introduces a flexible hierarchical Bayesian framework that permits the simultaneous modeling of mixed type variable. Specifically, the mixed-type attributes are mapped to latent numerical random variables that are multivariate Gaussian in nature. Finally, the knot-based techniques is utilized to model the predictive process as a reduced rank spatial process, which projects the process realizations of the spatial model to a lower dimensional subspace. This projection significantly reduces the computational cost. / Ph. D.
|
144 |
The effect of sugar-sweetened beverage consumption on childhood obesity - causal evidenceYang, Yan 18 May 2016 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Communities and States are increasingly targeting the consumption of sugar
sweetened beverages (SSBs), especially soda, in their efforts to curb childhood obesity.
However, the empirical evidence based on which policy makers design the relevant policies
is not causally interpretable. In the present study, we suggest a modeling framework that
can be used for making causal estimation and inference in the context of childhood obesity.
This modeling framework is built upon the two-stage residual inclusion (2SRI)
instrumental variables method and have two levels – level one models children’s lifestyle
choices and level two models children’s energy balance which is assumed to be dependent
on their lifestyle behaviors.
We start with a simplified version of the model that includes only one policy, one
lifestyle, one energy balance, and one observable control variable. We then extend this
simple version to be a general one that accommodates multiple policy and lifestyle
variables. The two versions of the model are 1) first estimated via the nonlinear least square
(NLS) method (henceforth NLS-based 2SRI); and 2) then estimated via the maximum
likelihood estimation (MLE) method (henceforth MLE-based 2SRI). Using simulated data,
we show that 1) our proposed 2SRI method outperforms the conventional method that
ignores the inherent nonlinearity [the linear instrumental variables (LIV) method] or the
potential endogeneity [the nonlinear regression (NR) method] in obtaining the relevant
estimators; and 2) the MLE-based 2SRI provides more efficient estimators (also consistent)
compared to the NLS-based one. Real data analysis is conducted to illustrate the implementation of 2SRI method in practice using both NLS and MLE methods. However,
due to data limitation, we are not able to draw any inference regarding the impacts of
lifestyle, specifically SSB consumption, on childhood obesity. We are in the process of
getting better data and, after doing so, we will replicate and extend the analyses conducted
here. These analyses, we believe, will produce causally interpretable evidence of the effects
of SSB consumption and other lifestyle choices on childhood obesity. The empirical
analyses presented in this dissertation should, therefore, be viewed as an illustration of our
newly proposed framework for causal estimation and inference.
|
145 |
Empirical stadies of online markets: the impact of product page cues on consumer decisionsBanerjee, Shrabastee 14 May 2021 (has links)
The widespread expansion of online markets in the past decade poses several questions for platforms, firms and customers alike. An important dimension to be explored in this domain is the provision of information on e-commerce platforms - given the increasing ease with which product pages can be customized to include a vast variety of content, how do these pieces of information interact? Further, what are the specific channels through which this information eventually influences consumer decision-making? My dissertation is situated in this space, and aims to look at how consumers respond to various “cues” that are being introduced by e-commerce platforms which offer products or services that can be purchased online, and how these cues might eventually influence decision-making. In my first dissertation project, the cue I focus on is user generated content. More specifically, I study how the introduction of the Q&A technology (which enables customers to ask product-specific questions before purchase, and receive answers either from other customers or the platform itself) affects the more widely established reviews and ratings feature
on e-commerce platforms. I find that the addition of Q&As leads to better matches between customers and products, higher customer satisfaction, and resultantly higher ratings. My second project examines another cue that is common in online markets, which is the advertised reference price. My goal in this project is to examine how users react to a specific variant of such prices, namely the “Starting from...” price, using data from a large scale field experiment conducted on Holidu.com. My results indicate that raising “From” prices gives users a more accurate price estimate, but it negatively impacts outbound clicks and other engagement metrics. Taken together, the two projects aim to shed light on factors that influence consumer decision-making in an e-commerce setting, and the possible mechanisms underlying this influence.
|
146 |
Static Evaluation of Type Inference and Propagation on Global Variables with Varying ContextFrasure, Ivan 06 June 2019 (has links)
No description available.
|
147 |
Estimation and the Stress-Strength ModelBrownstein, Naomi 01 January 2007 (has links)
The paper considers statistical inference for R = P(X < Y) in the case when both X and Y have generalized gamma distributions. The maximum likelihood estimators for R are developed in the case when either all three parameters of the generalized gamma distributions are unknown or when the shape parameters are known. In addition, objective Bayes estimators based on non informative priors are constructed when the shape parameters are known. Finally, the uniform minimum variance unbiased estimators (UMVUE) are derived in the case when only the scale parameters are unknown.
|
148 |
Sequential Inference and Goodness of Fit Testing using Energy Statistics for the Power Normal and Modified Power Normal DistributionsCraig, Bradley 11 August 2023 (has links)
No description available.
|
149 |
FROntIER: A Framework for Extracting and Organizing Biographical Facts in Historical DocumentsPark, Joseph 01 January 2015 (has links) (PDF)
The tasks of entity recognition through ontological commitment, fact extraction and organization with respect to a target schema, and entity deduplication have all been examined in recent years, and systems exist that can perform each individual task. A framework combining all these tasks, however, is still needed to accomplish the goal of automatically extracting and organizing biographical facts about persons found in historical documents into disambiguated entity records. We introduce FROntIER (Fact Recognizer for Ontologies with Inference and Entity Resolution) as the framework to recognize and extract facts using an ontology and organize facts of interest through inferring implicit facts using inference rules, a target ontology, and entity resolution. We give two case studies of FROntIER's performance over a few select pages from The Ely Ancestry [BEV02] and Index to The Register of Marriages and Baptisms in the Parish of Kilbarchan, 1649-1772 [Gra12].
|
150 |
Analysis of the impact on phylogenetic inference of non-reversible nucleotide substitution modelsSianga, Rita 12 September 2023 (has links) (PDF)
Most phylogenetic trees are inferred using time-reversible evolutionary models that assume that the relative rates of substitution for any given pair of nucleotides are the same regardless of the direction of the substitutions. However, there is no reason to assume that the underlying biochemical mutational processes that cause substitutions are similarly symmetrical. Here, we evaluate the effect on phylogenetic inference in empirical viral and simulated data of incorporating non-reversibility into models of nucleotide substitution processes. I consider two non-reversible nucleotide substitution models: (1) a 6-rate nonreversible model (NREV6) that is applicable to analyzing mutational processes in double-stranded genomes in that complementary substitutions occur at identical rates; and (2) a 12-rate non-reversible model (NREV12) that is applicable to analyzing mutational processes in single-stranded (ss) genomes in that all substitution types are free to occur at different rates. Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit than the General Time Reversible (GTR) and NREV6 models to 21/31 dsRNA and 20/30 dsDNA datasets. As expected, however, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. I tested how non-reversibility impacts the accuracy with which phylogenetic trees are inferred. As simulated degrees of non-reversibility (DNR) increased, the tree topology inferences using both NREV12 and GTR became more accurate, whereas inferred tree branch lengths became less accurate. I conclude that while non-reversible models should be helpful in the analysis of mutational processes in most virus species, there is no pressing need to use these models for routine phylogenetic inference. Finally, I introduce a web application, RpNRM, that roots phylogenetic trees using a non-reversible nucleotide substitution model. The phylogenetic tree is rooted on every branch and the likelihoods of each rooting are determined and compared with the highest likelihood tree being identified as that with the most plausible rooting. The rooting accuracy of RpNRM was compared to that of the outgroup rooting method, the midpoint rooting method and another non-reversible model-based rooting method implemented in the program IQTREE. I find that although the RpNRM and IQTREE reversible model-based methods are not as accurate on their own as outgroup or midpoint rooting methods, they nevertheless provide an independent means of verifying the root locations that are inferred by these other methods.
|
Page generated in 0.0658 seconds