• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2187
  • 363
  • 282
  • 175
  • 98
  • 71
  • 38
  • 35
  • 34
  • 25
  • 24
  • 21
  • 21
  • 20
  • 20
  • Tagged with
  • 3998
  • 518
  • 470
  • 465
  • 423
  • 417
  • 413
  • 396
  • 379
  • 357
  • 335
  • 312
  • 287
  • 283
  • 274
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Donsker classes, Vapnik-Chervonenkis classes, and chi-squared tests of fit with random cells

Durst, Mark Joseph January 1980 (has links)
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND SCIENCE. / Bibliography: leaves 91-93. / by Mark Joseph Durst. / Ph.D.
202

Exact test for an epidemic change in a sequence of exponentially distributed random variables.

January 2005 (has links)
Lai Kim Fung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 55-57). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Likelihood Ratio Test Statistic --- p.6 / Chapter 2.1 --- Introduction --- p.6 / Chapter 2.2 --- Formulation --- p.6 / Chapter 2.3 --- Likelihood Ratio Type Statistic --- p.7 / Chapter 2.4 --- Dirichlet Distribution --- p.8 / Chapter 2.5 --- Edgeworth Expansion --- p.12 / Chapter 3 --- Divided Difference --- p.15 / Chapter 3.1 --- Introduction --- p.15 / Chapter 3.2 --- Definition of Divided Difference --- p.15 / Chapter 3.3 --- Theorem --- p.17 / Chapter 3.4 --- Proof of the Theorem --- p.18 / Chapter 3.5 --- Application of Theorem --- p.19 / Chapter 4 --- Computational Results --- p.22 / Chapter 4.1 --- Introduction --- p.22 / Chapter 4.2 --- Critical Values for Moderate and Large Sample Sizes --- p.22 / Chapter 4.3 --- Critical Values for Small Sample Sizes --- p.23 / Chapter 4.3.1 --- Exact Critical Values --- p.23 / Chapter 4.3.2 --- Edgeworth Expansion Results --- p.23 / Chapter 4.3.3 --- Simulation Results --- p.23 / Chapter 4.4 --- Power --- p.24 / Chapter 5 --- Illustrative Examples --- p.29 / Chapter 5.1 --- Stanford Heart Transplant Data --- p.29 / Chapter 5.1.1 --- The Data --- p.29 / Chapter 5.1.2 --- Result --- p.31 / Chapter 5.2 --- Air Conditioning Data --- p.31 / Chapter 5.2.1 --- The Data --- p.31 / Chapter 5.2.2 --- Result --- p.32 / Chapter 5.3 --- Insulating Fluid Failure Data --- p.33 / Chapter 5.3.1 --- The Data --- p.33 / Chapter 5.3.2 --- Result --- p.33 / Chapter 6 --- Conclusion and Further Research Topic --- p.35 / Chapter 6.1 --- Conclusion --- p.35 / Chapter 6.2 --- Further Research Topic --- p.38 / Appendix A --- p.39 / Appendix B --- p.46 / Bibliography --- p.55
203

Networks: a random walk in degree space / Redes: um passeio aleatório no espaço dos graus

Fernanda Ampuero 18 May 2018 (has links)
The present work aims to contribute to the study of networks by mapping the temporal evolution of the degree to a random walk in degree space. We analyzed how and when the degree approximates a pre-established value through a parallel with the first-passage problem of random walks. The mean time for the first-passage was calculated for the dynamical versions the Watts-Strogatz and Erdos-Renyi models. We also analyzed the degree variance for the random recursive tree and Barabasi-Albert models / O presente trabalho visa contribuir com a pesquisa na área de redes através do mapeamento da evolução temporal do grau com um passeio aleatório no espaço do mesmo. Para tanto, foi feita uma análise de quando e como a quantidade de ligações do vértice se aproxima de um valor pré-estabelecido, mediante um paralelo com o problema da primeira passagem de passeios aleatórios. O tempo médio para a primeira passagem para as versões dinâmicas dos modelos Watts-Strogatz e Erdos-Rényi foram calculados. Além disso, foi realizado um estudo da variância do grau para os modelos da árvore recursiva aleatória e Barabási-Albert
204

Studying the ability of finding single and interaction effects with Random Forest, and its application in psychiatric genetics

Neira Gonzalez, Lara Andrea January 2018 (has links)
Psychotic disorders such as schizophrenia and bipolar disorder have a strong genetic component. The aetiology of psychoses is known to be complex, including additive effects from multiple susceptibility genes, interactions between genes, environmental risk factors, and gene by environment interactions. With the development of new technologies such as genome-wide association studies and imputation of ungenotyped variants, the amount of genomic data has increased dramatically leading to the necessary use of Machine Learning techniques. Random Forest has been widely used to study the underlying genetic factors of psychiatric disorders such as epistasis and gene-gene interactions. Several authors have investigated the ability of this algorithm in finding single and interaction effects, but have reported contradictory results. Therefore, in order to examine Random Forest ability of detecting single and interaction effects based on different variable importance measures, I conducted a simulation study assessing whether the algorithm was able to detect single and interaction models under different correlation conditions. The results suggest that the optimal Variable Importance Measures to use in real situations under correlation is the unconditional unscaled permutation variable importance measure. Several studies have shown bias in one of the most popular variable importance measures, the Gini importance. Hence, in a second simulation study I study whether the Gini variable importance is influenced by the variability of predictors, the precision of measuring them, and the variability of the error. Evidence of other biases in this variable importance was found. The results from the first simulation study were used to study whether genes related to 29 molecular biomarkers, which have been associated with schizophrenia, influence risk for schizophrenia in a case-control study of 26476 cases and 31804 controls from 39 different European ancestry cohorts. Single effects from ACAT2 and TNC genes were detected to contribute risk for schizophrenia. ACAT2 is a gene in the chromosome 6 which is related to energy metabolism. Transcriptional differences have been shown in schizophrenia brain tissue studies. TNC is expressed in the brain where is involved in the migration of the neurons and axons. In addition, we also used the simulation results to examine whether interactions between genes associated with abnormal emotion/affect behaviour influence risk for psychosis and cognition in humans, in a case-control study of 2049 cases and 1794 controls. Before correcting for multiple testing, significant interactions between CRHR1 and ESR1, and between MAPT and ESR1, and among CRHR1, ESR1 and TOM1L2, and among MAPT, ESR1 and TOM1L2 were observed in abnormal fear/anxiety-related behaviour pathway. There was no evidence for epistasis after Bonferroni correction.
205

Temps local et diffusion en environnement aléatoire / Local time and diffusion in random environment

Diel, Roland 03 December 2010 (has links)
On appelle diffusion en milieu aléatoire la solution de l’équation différentielle stochastique suivante : dX(t) = dB(t) − 1/2 W’(X(t))dt où B est un mouvement brownien standard et W, le milieu, est un processus càd-làg qui n’est pas nécessairement dérivable (l’EDS précédente n’a alors qu’un sens formel). Schumacher [69] et Brox [17] ont montré que dans le cas où W est un mouvement brownien, la diffusion X a un comportement sous-diffusif et se localise au voisinage de certains points du milieu. Cette thèse est principalement consacrée à l’étude du comportement asymptotique du processus des temps locaux de X. Ce processus LX(t, x) représente le temps passé par X au point x avant le temps t. C’est donc un outil bien adapté pour étudier la localisation de la diffusion. On décrit ici la loi limite du temps local lorsque le milieu est un mouvement brownien standard ou plus généralement un processus de Lévy stable. On s’intéresse également au temps passé par la diffusion au voisinage des points les plus visités et au comportement asymptotique presque sûr du maximum du temps local. Dans la dernière partie de la thèse, on utilise le temps local d’une version discrète du modèle, pour obtenir des informations sur le milieu. Le but étant d’appliquer ce modèle au séquençage de l’ADN. / A diffusion in random environment is the solution of the following stochastic differential equation: dX(t) = dB(t) − 1/2 W’(X(t))dt where B is a standard Brownian motion and W a càd-làg process which is not necessarily differentiable (the previous SDE has then only a formal sense). Schumacher [69] and Brox [17] have shown that the diffusion X has a sub-diffusive behavior when W is also a standard Brownian motion. Moreover they point out a localization phenomena for X. This thesis is principally devoted to the description of the asymptotic behavior of the local time process of X. The local time LX(t, x) represents the time spent by X before t at point x. This is thereby a useful tool to study the localization of the diffusion. Here is described the limit law of the local time when the environment is a Brownian motion or more generally a stable Lévy process. We are also interested in the time spent by X in the neighborhood of the most visited points and in the almost sure asymptotic behavior of the maximum of the local time. In the last chapter of the thesis the notion of local time is used in a discrete version of the model to obtain informations on the environment. The goal is to apply this model to DNA sequencing.
206

Investigations of Variable Importance Measures Within Random Forests

Merrill, Andrew C. 01 May 2009 (has links)
Random Forests (RF) (Breiman 2001; Breiman and Cutler 2004) is a completely nonparametric statistical learning procedure that may be used for regression analysis and. A feature of RF that is drawing a lot of attention is the novel algorithm that is used to evaluate the relative importance of the predictor/explanatory variables. Other machine learning algorithms for regression and classification, such as support vector machines and artificial neural networks (Hastie et al. 2009), exhibit high predictive accuracy but provide little insight into predictive power of individual variables. In contrast, the permutation algorithm of RF has already established a track record for identification of important predictors (Huang et al. 2005; Cutler et al. 2007; Archer and Kimes 2008). Recently, however, some authors (Nicodemus and Shugart 2007; Strobl et al. 2007, 2008) have shown that the presence of categorical variables with many categories (Strobl et al. 2007) or high colinearity give unduly large variable importance using the standard RF permutation algorithm (Strobl et al. 2008). This work creates simulations from multiple linear regression models with small numbers of variables to understand the issues raised by Strobl et al. (2008) regarding shortcomings of the original RF variable importance algorithm and the alternatives implemented in conditional forests (Strobl et al. 2008). In addition this paper will look at the dependence of RF variable importance values on user-defined parameters.
207

Regularized Discriminant Analysis: A Large Dimensional Study

Yang, Xiaoke 28 April 2018 (has links)
In this thesis, we focus on studying the performance of general regularized discriminant analysis (RDA) classifiers. The data used for analysis is assumed to follow Gaussian mixture model with different means and covariances. RDA offers a rich class of regularization options, covering as special cases the regularized linear discriminant analysis (RLDA) and the regularized quadratic discriminant analysis (RQDA) classi ers. We analyze RDA under the double asymptotic regime where the data dimension and the training size both increase in a proportional way. This double asymptotic regime allows for application of fundamental results from random matrix theory. Under the double asymptotic regime and some mild assumptions, we show that the asymptotic classification error converges to a deterministic quantity that only depends on the data statistical parameters and dimensions. This result not only implicates some mathematical relations between the misclassification error and the class statistics, but also can be leveraged to select the optimal parameters that minimize the classification error, thus yielding the optimal classifier. Validation results on the synthetic data show a good accuracy of our theoretical findings. We also construct a general consistent estimator to approximate the true classification error in consideration of the unknown previous statistics. We benchmark the performance of our proposed consistent estimator against classical estimator on synthetic data. The observations demonstrate that the general estimator outperforms others in terms of mean squared error (MSE).
208

Random Relational Rules

Anderson, Grant January 2008 (has links)
In the field of machine learning, methods for learning from single-table data have received much more attention than those for learning from multi-table, or relational data, which are generally more computationally complex. However, a significant amount of the world's data is relational. This indicates a need for algorithms that can operate efficiently on relational data and exploit the larger body of work produced in the area of single-table techniques. This thesis presents algorithms for learning from relational data that mitigate, to some extent, the complexity normally associated with such learning. All algorithms in this thesis are based on the generation of random relational rules. The assumption is that random rules enable efficient and effective relational learning, and this thesis presents evidence that this is indeed the case. To this end, a system for generating random relational rules is described, and algorithms using these rules are evaluated. These algorithms include direct classification, classification by propositionalisation, clustering, semi-supervised learning and generating random forests. The experimental results show that these algorithms perform competitively with previously published results for the datasets used, while often exhibiting lower runtime than other tested systems. This demonstrates that sufficient information for classification and clustering is retained in the rule generation process and that learning with random rules is efficient. Further applications of random rules are investigated. Propositionalisation allows single-table algorithms for classification and clustering to be applied to the resulting data, reducing the amount of relational processing required. Further results show that techniques for utilising additional unlabeled training data improve accuracy of classification in the semi-supervised setting. The thesis also develops a novel algorithm for building random forests by making efficient use of random rules to generate trees and leaves in parallel.
209

Contributions to the estimation of probabilistic discriminative models: semi-supervised learning and feature selection

Sokolovska, Nataliya 25 February 2010 (has links) (PDF)
Dans cette thèse nous étudions l'estimation de modèles probabilistes discriminants, surtout des aspects d'apprentissage semi-supervisé et de sélection de caractéristiques. Le but de l'apprentissage semi-supervisé est d'améliorer l'efficacité de l'apprentissage supervisé en utilisant des données non-étiquetées. Cet objectif est difficile à atteindre dans les cas des modèles discriminants. Les modèles probabilistes discriminants permettent de manipuler des représentations linguistiques riches, sous la forme de vecteurs de caractéristiques de très grande taille. Travailler en grande dimension pose des problèmes, en particulier computationnels, qui sont exacerbés dans le cadre de modèles de séquences tels que les champs aléatoires conditionnels (CRF). Notre contribution est double. Nous introduisons une méthode originale et simple pour intégrer des données non étiquetées dans une fonction objectif semi-supervisée. Nous démontrons alors que l'estimateur semi-supervisé correspondant est asymptotiquement optimal. Le cas de la régression logistique est illustré par des résultats d'expèriences. Dans cette étude, nous proposons un algorithme d'estimation pour les CRF qui réalise une sélection de modèle, par le truchement d'une pénalisation $L_1$. Nous présentons également les résultats d'expériences menées sur des tâches de traitement des langues (le chunking et la détection des entités nommées), en analysant les performances en généralisation et les caractéristiques sélectionnées. Nous proposons finalement diverses pistes pour améliorer l'efficacité computationelle de cette technique.
210

Measure-equivalence of quadratic forms

Limmer, Douglas J. 07 May 1999 (has links)
This paper examines the probability that a random polynomial of specific degree over a field has a specific number of distinct roots in that field. Probabilities are found for random quadratic polynomials with respect to various probability measures on the real numbers and p-adic numbers. In the process, some properties of the p-adic integer uniform random variable are explored. The measure Witt ring, a generalization of the canonical Witt ring, is introduced as a way to link quadratic forms and measures, and examples are found for various fields and measures. Special properties of the Haar measure in connection with the measure Witt ring are explored. Higher-degree polynomials are explored with the aid of numerical methods, and some conjectures are made regarding higher-degree p-adic polynomials. Other open questions about measure Witt rings are stated. / Graduation date: 1999

Page generated in 0.0488 seconds