1 |
A Statistical Evaluation of Algorithms for Independently Seeding Pseudo-Random Number Generators of Type Multiplicative Congruential (Lehmer-Class).Stewart, Robert Grisham 14 August 2007 (has links)
To be effective, a linear congruential random number generator (LCG) should produce values that are (a) uniformly distributed on the unit interval (0,1) excluding endpoints and (b) substantially free of serial correlation. It has been found that many statistical methods produce inflated Type I error rates for correlated observations. Theoretically, independently seeding an LCG under the following conditions attenuates serial correlation: (a) simple random sampling of seeds, (b) non-replicate streams, (c) non-overlapping streams, and (d) non-adjoining streams. Accordingly, 4 algorithms (each satisfying at least 1 condition) were developed: (a) zero-leap, (b) fixed-leap, (c) scaled random-leap, and (d) unscaled random-leap. Note that the latter satisfied all 4 independent seeding conditions.
To assess serial correlation, univariate and multivariate simulations were conducted at 3 equally spaced intervals for each algorithm (N=24) and measured using 3 randomness tests: (a) the serial correlation test, (b) the runs up test, and (c) the white noise test. A one-way balanced multivariate analysis of variance (MANOVA) was used to test 4 hypotheses: (a) omnibus, (b) contrast of unscaled vs. others, (c) contrast of scaled vs. others, and (d) contrast of fixed vs. others. The MANOVA assumptions of independence, normality, and homogeneity were satisfied.
In sum, the seeding algorithms did not differ significantly from each other (omnibus hypothesis). For the contrast hypotheses, only the fixed-leap algorithm differed significantly from all other algorithms. Surprisingly, the scaled random-leap offered the least difference among the algorithms (theoretically this algorithm should have produced the second largest difference). Although not fully supported by the research design used in this study, it is thought that the unscaled random-leap algorithm is the best choice for independently seeding the multiplicative congruential random number generator. Accordingly, suggestions for further research are proposed.
|
2 |
Geometric algorithms for component analysis with a view to gene expression data analysisJournée, Michel 04 June 2009 (has links)
The research reported in this thesis addresses the problem of component analysis, which aims at reducing large data to lower dimensions, to reveal the essential structure of the data. This problem is encountered in almost all areas of science - from physics and biology to finance, economics and psychometrics - where large data sets need to be analyzed.
Several paradigms for component analysis are considered, e.g., principal component analysis, independent component analysis and sparse principal component analysis, which are naturally formulated as an optimization problem subject to constraints that endow the problem with a well-characterized matrix manifold structure. Component analysis is so cast in the realm of optimization on matrix manifolds. Algorithms for component analysis are subsequently derived that take advantage of the geometrical structure of the problem.
When formalizing component analysis into an optimization framework, three main classes of problems are encountered, for which methods are proposed. We first consider the problem of optimizing a smooth function on the set of n-by-p real matrices with orthonormal columns. Then, a method is proposed to maximize a convex function on a compact manifold, which generalizes to this context the well-known power method that computes the dominant eigenvector of a matrix. Finally, we address the issue of solving problems defined in terms of large positive semidefinite matrices in a numerically efficient manner by using low-rank approximations of such matrices.
The efficiency of the proposed algorithms for component analysis is evaluated on the analysis of gene expression data related to breast cancer, which encode the expression levels of thousands of genes gained from experiments on hundreds of cancerous cells. Such data provide a snapshot of the biological processes that occur in tumor cells and offer huge opportunities for an improved understanding of cancer. Thanks to an original framework to evaluate the biological significance of a set of components, well-known but also novel knowledge is inferred about the biological processes that underlie breast cancer.
Hence, to summarize the thesis in one sentence: We adopt a geometric point of view to propose optimization algorithms performing component analysis, which, applied on large gene expression data, enable to reveal novel biological knowledge.
|
3 |
Decoupling control in statistical sense: minimised mutual information algorithmZhang, Qichun, Wang, A. 03 October 2019 (has links)
No / This paper presents a novel concept to describe the couplings among the outputs of the stochastic systems which are represented by NARMA models. Compared with the traditional coupling description, the presented concept can be considered as an extension using statistical independence theory. Based on this concept, the decoupling control in statistical sense is established with the necessary and sufficient conditions for complete decoupling. Since the complete decoupling is difficult to achieve, a control algorithm has been developed using the Cauchy-Schwarz mutual information criterion. Without modifying the existing control loop, this algorithm supplies a compensative controller to minimise the statistical couplings of the system outputs and the local stability has been analysed. In addition, a further discussion illustrates the combination of the presented control algorithm and data-based mutual information estimation. Finally, a numerical example is given to show the feasibility and efficiency of the proposed algorithm.
|
4 |
Independência parcial no problema da satisfazibilidade probabilística / Partial Independence in the Probabilistic Satisfiability ProblemMorais, Eduardo Menezes de 20 April 2018 (has links)
O problema da Satisfazibilidade Probabilística, PSAT, apesar da sua flexibilidade, torna exponencialmente complexa a modelagem de variáveis estatisticamente independentes. Esta tese busca desenvolver algoritmos e propostas de relaxamento para permitir o tratamento eficiente de independência parcial pelo PSAT. Apresentamos uma aplicação do PSAT ao problema da etiquetagem morfossintática que serve tanto de motivação como de demonstração dos conceitos apresentados. / The Probabilistic Satisfiability Problem, PSAT, despite its flexibility, makes it exponentially complicated to model statistically independent variables. This thesis develops algorithms and relaxation proposals that allow an efficient treatment of partial independence with PSAT. We also present an application of PSAT on the Part-of-speech tagging problem to serve both as motivation and showcase of the presented concepts.
|
5 |
Independência parcial no problema da satisfazibilidade probabilística / Partial Independence in the Probabilistic Satisfiability ProblemEduardo Menezes de Morais 20 April 2018 (has links)
O problema da Satisfazibilidade Probabilística, PSAT, apesar da sua flexibilidade, torna exponencialmente complexa a modelagem de variáveis estatisticamente independentes. Esta tese busca desenvolver algoritmos e propostas de relaxamento para permitir o tratamento eficiente de independência parcial pelo PSAT. Apresentamos uma aplicação do PSAT ao problema da etiquetagem morfossintática que serve tanto de motivação como de demonstração dos conceitos apresentados. / The Probabilistic Satisfiability Problem, PSAT, despite its flexibility, makes it exponentially complicated to model statistically independent variables. This thesis develops algorithms and relaxation proposals that allow an efficient treatment of partial independence with PSAT. We also present an application of PSAT on the Part-of-speech tagging problem to serve both as motivation and showcase of the presented concepts.
|
6 |
Independent component analysis and slow feature analysisBlaschke, Tobias 25 May 2005 (has links)
Der Fokus dieser Dissertation liegt auf den Verbindungen zwischen ICA (Independent Component Analysis - Unabhängige Komponenten Analyse) und SFA (Slow Feature Analysis - Langsame Eigenschaften Analyse). Um einen Vergleich zwischen beiden Methoden zu ermöglichen wird CuBICA2, ein ICA Algorithmus basierend nur auf Statistik zweiter Ordnung, d.h. Kreuzkorrelationen, vorgestellt. Dieses Verfahren minimiert zeitverzögerte Korrelationen zwischen Signalkomponenten, um die statistische Abhängigkeit zwischen denselben zu reduzieren. Zusätzlich wird eine alternative SFA-Formulierung vorgestellt, die mit CuBICA2 verglichen werden kann. Im Falle linearer Gemische sind beide Methoden äquivalent falls nur eine einzige Zeitverzögerung berücksichtigt wird. Dieser Vergleich kann allerdings nicht auf mehrere Zeitverzögerungen erweitert werden. Für ICA lässt sich zwar eine einfache Erweiterung herleiten, aber ein ähnliche SFA-Erweiterung kann nicht im originären SFA-Sinne (SFA extrahiert die am langsamsten variierenden Signalkomponenten aus einem gegebenen Eingangssignal) interpretiert werden. Allerdings kann eine im SFA-Sinne sinnvolle Erweiterung hergeleitet werden, welche die enge Verbindung zwischen der Langsamkeit eines Signales (SFA) und der zeitlichen Vorhersehbarkeit desselben verdeutlich. Im Weiteren wird CuBICA2 und SFA kombiniert. Das Resultat kann aus zwei Perspektiven interpretiert werden. Vom ICA-Standpunkt aus führt die Kombination von CuBICA2 und SFA zu einem Algorithmus, der das Problem der nichtlinearen blinden Signalquellentrennung löst. Vom SFA-Standpunkt aus ist die Kombination eine Erweiterung der standard SFA. Die standard SFA extrahiert langsam variierende Signalkomponenten die untereinander unkorreliert sind, dass heißt statistisch unabhängig bis zur zweiten Ordnung. Die Integration von ICA führt nun zu Signalkomponenten die mehr oder weniger statistisch unabhängig sind. / Within this thesis, we focus on the relation between independent component analysis (ICA) and slow feature analysis (SFA). To allow a comparison between both methods we introduce CuBICA2, an ICA algorithm based on second-order statistics only, i.e.\ cross-correlations. In contrast to algorithms based on higher-order statistics not only instantaneous cross-correlations but also time-delayed cross correlations are considered for minimization. CuBICA2 requires signal components with auto-correlation like in SFA, and has the ability to separate source signal components that have a Gaussian distribution. Furthermore, we derive an alternative formulation of the SFA objective function and compare it with that of CuBICA2. In the case of a linear mixture the two methods are equivalent if a single time delay is taken into account. The comparison can not be extended to the case of several time delays. For ICA a straightforward extension can be derived, but a similar extension to SFA yields an objective function that can not be interpreted in the sense of SFA. However, a useful extension in the sense of SFA to more than one time delay can be derived. This extended SFA reveals the close connection between the slowness objective of SFA and temporal predictability. Furthermore, we combine CuBICA2 and SFA. The result can be interpreted from two perspectives. From the ICA point of view the combination leads to an algorithm that solves the nonlinear blind source separation problem. From the SFA point of view the combination of ICA and SFA is an extension to SFA in terms of statistical independence. Standard SFA extracts slowly varying signal components that are uncorrelated meaning they are statistically independent up to second-order. The integration of ICA leads to signal components that are more or less statistically independent.
|
Page generated in 0.0914 seconds