Spelling suggestions: "subject:"62007"" "subject:"62707""
1 |
Using Neural Networks to Classify Discrete Circular Probability DistributionsGaumer, Madelyn 01 January 2019 (has links)
Given the rise in the application of neural networks to all sorts of interesting problems, it seems natural to apply them to statistical tests. This senior thesis studies whether neural networks built to classify discrete circular probability distributions can outperform a class of well-known statistical tests for uniformity for discrete circular data that includes the Rayleigh Test1, the Watson Test2, and the Ajne Test3. Each neural network used is relatively small with no more than 3 layers: an input layer taking in discrete data sets on a circle, a hidden layer, and an output layer outputting probability values between 0 and 1, with 0 mapping to uniform and 1 mapping to nonuniform. In evaluating performances, I compare the accuracy, type I error, and type II error of this class of statistical tests and of the neural networks built to compete with them.
1 Jammalamadaka, S. Rao(1-UCSB-PB); SenGupta, A.(6-ISI-ASU)Topics in circular statistics. (English summary) With 1 IBM-PC floppy disk (3.5 inch; HD). Series on Multivariate Analysis, 5. World Scientific Publishing Co., Inc., River Edge, NJ, 2001. xii+322 pp. ISBN: 981-02-3778-2
2 Watson, G. S.Goodness-of-fit tests on a circle. II. Biometrika 49 1962 57–63.
3 Ajne, B.A simple test for uniformity of a circular distribution. Biometrika 55 1968 343–354.
|
2 |
The Document Similarity Network: A Novel Technique for Visualizing Relationships in Text CorporaBaker, Dylan 01 January 2017 (has links)
With the abundance of written information available online, it is useful to be able to automatically synthesize and extract meaningful information from text corpora. We present a unique method for visualizing relationships between documents in a text corpus. By using Latent Dirichlet Allocation to extract topics from the corpus, we create a graph whose nodes represent individual documents and whose edge weights indicate the distance between topic distributions in documents. These edge lengths are then scaled using multidimensional scaling techniques, such that more similar documents are clustered together. Applying this method to several datasets, we demonstrate that these graphs are useful in visually representing high-dimensional document clustering in topic-space.
|
3 |
Recursive Partitioning of Models of a Generalized Linear Model TypeRusch, Thomas 10 June 2012 (has links) (PDF)
This thesis is concerned with recursive partitioning of models of a generalized linear model type (GLM-type), i.e., maximum likelihood models with a linear predictor for the linked mean, a topic that has received constant interest over the last twenty years. The resulting tree (a ''model tree'') can be seen as an extension of classic trees, to allow for a GLM-type model in the partitions. In this work, the focus lies on applied and computational aspects of model trees with GLM-type node models to work out different areas where application of the combination of parametric models and trees will be beneficial and to build a computational scaffold for future application of model trees. In the first part, model trees are defined and some algorithms for fitting model trees with GLM-type node model are reviewed and compared in terms of their properties of tree induction and node model fitting. Additionally, the design of a particularly versatile algorithm, the MOB algorithm (Zeileis et al. 2008) in R is described and an in-depth discussion of how the functionality offered can be extended to various GLM-type models is provided. This is highlighted by an example of using partitioned negative binomial models for investigating the effect of health care incentives. Part 2 consists of three research articles where model trees are applied to different problems that frequently occur in the social sciences. The first uses trees with GLM-type node models and applies it to a data set of voters, who show a non-monotone relationship between the frequency of attending past elections and the turnout in 2004. Three different type of model tree algorithms are used to investigate this phenomenon and for two the resulting trees can explain the counter-intuitive finding. Here model tress are used to learn a nonlinear relationship between a target model and a big number of candidate variables to provide more insight into a data set. A second application area is also discussed, namely using model trees to detect ill-fitting subsets in the data. The second article uses model trees to model the number of fatalities in Afghanistan war, based on the WikiLeaks Afghanistan war diary. Data pre-processing with a topic model generates predictors that are used as explanatory variables in a model tree for overdispersed count data. Here the combination of model trees and topic models allows to flexibly analyse database data, frequently encountered in data journalism, and provides a coherent description of fatalities in the Afghanistan war. The third paper uses a new framework built around model trees to approach the classic problem of segmentation, frequently encountered in marketing and management science. Here, the framework is used for segmentation of a sample of the US electorate for identifying likely and unlikely voters. It is shown that the framework's model trees enable accurate identification which in turn allows efficient targeted mobilisation of eligible voters. (author's abstract)
|
4 |
Modeling and projecting Nepal´s Mortality and FertilityDevkota, Jyoti U. 26 September 2000 (has links)
The objective behind this study was to mathematically analyse, model and forecast the vital rates (mortality and fertility)
of Nepal. In order to attain this goal, the data have been converted into tables and analysed intensively using several
softwares such as Mocrosoft Excel, SPSS, Mathematica. The margin of error of data has been analysed. In Chapter 4, the
error and uncertainity in the data have been analysed using Bayesian analysis. The reliability of the data of Nepal has been
compared with the reliability of the data of Germany. The mortality and fertility conditions of Nepal have been compared
from two angles. Data on India (particularly north India) have provided comparison on the socio-economic grounds
whereas data on Germany(with accurate and abundant data) have provided comparison on the ground of data availability
and accuracy. Thus in addition to analysing and modeling the data, the regional behaviour has been studied. The limited and
defective data of Nepal have posed a challange at every stage and phase. Because of this very long term forecasting of
mortality could not be made. But the model has provided a lot of information on the mortality for the years for which the
data were lacking. But in the comming future, with new data at hand and with the new models developed here, it could be
possible to do long term projections. In the less developed world, rural and urban areas have a big impact on the mortality
and fertility of a country. The rural and urban effects on mortality and fertility have been studied individually. While
analyzing the mortality scene of Nepal, it has been observed that the mortality is decreasing. The decrease is slow, but it
reflects the advancement in medical facilities and health awareness. The fertility is also decreasing. There is a decrease in
the number of children per woman and per family. This decrease is more pronounced in the urban areas as compared to the
rural areas. This also reflects that the family planning programmes launched are showing results, particularly in urban
areas.
|
5 |
Time Series Analysis informed by Dynamical Systems TheorySchumacher, Johannes 11 June 2015 (has links)
This thesis investigates time series analysis tools for prediction, as well as detection and characterization of dependencies, informed by dynamical systems theory.
Emphasis is placed on the role of delays with respect to information processing
in dynamical systems, as well as with respect to their effect in causal interactions between systems.
The three main features that characterize this work are, first, the assumption that
time series are measurements of complex deterministic systems. As a result, functional mappings for statistical models in all methods are justified by concepts from
dynamical systems theory. To bridge the gap between dynamical systems theory and data, differential topology is employed in the analysis. Second, the Bayesian paradigm of statistical inference is used to formalize uncertainty by means of a consistent
theoretical apparatus with axiomatic foundation. Third, the statistical models
are strongly informed by modern nonlinear concepts from machine learning and nonparametric modeling approaches, such as Gaussian process theory. Consequently,
unbiased approximations of the functional mappings implied by the prior system level analysis can be achieved.
Applications are considered foremost with respect to computational neuroscience
but extend to generic time series measurements.
|
Page generated in 0.0202 seconds