• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6752
  • 117
  • 29
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 6752
  • 1455
  • 1225
  • 1213
  • 1129
  • 962
  • 637
  • 636
  • 578
  • 465
  • 462
  • 452
  • 450
  • 404
  • 395
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Eulerian calculus arising from permutation statistics / Calcul Eulériens sur permutations

Lin, Zhicong 29 April 2014 (has links)
En 2010 Chung, Graham et Knuth ont démontré une remarquable identité symétrique sur les nombres eulériens et posé le problème de trouver un q-analogue de leur identité. En utilisant les q-polynômes eulériens introduits par Shareshian-Wachs, nous avons pu obtenir une telle q-identité. La preuve bijective que nous avons imaginée, nous a permis ensuite de démontrer d'autres q-identités symétriques, en utilisant un modèle combinatoire dû à Foata-Han. Entre temps, Hyatt a introduit les fonctions quasisymétriques eulériennes colorées afin d'étudier la distribution conjointe du nombre d'excédances et de l'indice majeur sur les permutations colorées. En appliquant le Decrease Value Theorem de Foata-Han, nous donnons d'abord une nouvelle preuve de sa formule principale sur la fonction génératrice des fonctions quasisymétriques eulériennes colorées, puis généralisons certaines identités eulériennes symétriques, en les exprimant comme des identités sur les fonctions quasisymétriques eulériennes colorées. D'autre part, en prolongeant les travaux récents de Savage-Visontai et Bec-raun, nous considérons plusieurs q-polynômes de descente des mots signés. Leurs fonctions génératrices factorielles et multivariées sont explicitement calculées. Par ailleurs, nous montrons que certains de ces polynômes n'ont que des zéros réels. Enfin, nous étudions la fonction génératrice diagonale des nombres de Jacobi Stirling de deuxième espèce, en généralisant des résultats analogues pour les nombres de Stirling et Legendre-Stirling de deuxième espèce. Il s'avère que cette fonction génératrice est une série rationnelle dont le numérateur est un polynôme à coefficients entiers positifs. En appliquant la théorie des P-partitions de Stanley nous trouvons des interprétations combinatoires de ces coefficients / In 2010 Chung-Graham-Knuth proved an interesting symmetric identity for the Eulerian numbers and asked for a q-analog version. Using the q-Eulerian polynomials introduced by Shareshian-Wachs we find such a q-identity. Moreover, we provide a bijective proof that we further generalize to prove other symmetric qidentities using a combinatorial model due to Foata-Han. Meanwhile, Hyatt has introduced the colored Eulerian quasisymmetric functions to study the joint distribution of the excedance number and major index on colored permutations. Using the Decrease Value Theorem of Foata-Han we give a new proof of his main generating function formula for the colored Eulerian quasisymmetric functions. Furthermore, certain symmetric q-Eulerian identities are generalized and expressed as identities involving the colored Eulerian quasisymmetric functions. Next, generalizing the recent works of Savage-Visontai and Beck-Braun we investigate some q-descent polynomials of general signed multipermutations. The factorial and multivariate generating functions for these q-descent polynomials are obtained and the real rootedness results of some of these polynomials are given. Finally, we study the diagonal generating function of the Jacobi-Stirling numbers of the second kind by generalizing the analogous results for the Stirling and Legendre-Stirling numbers of the second kind. It turns out that the generating function is a rational function, whose numerator is a polynomial with nonnegative integral coefficients. By applying Stanley’s theory of P-partitions we find combinatorial interpretations of those coefficients
22

Statistiques géométriques pour l'anatomie numérique / Geometric statistics for computational anatomy

Miolane, Nina 16 December 2016 (has links)
Cette thèse développe les statistiques géométriques pour l'analyse de lavariabilité normale et pathologique des formes d'organe en anatomienumérique. Les statistiques géométriques s’intéressent aux données issues devariétés avec structures géométriques additionnelles. En anatomie numérique,les formes d'un organe peuvent être vues comme des déformations d'un organede référence - i.e. comme éléments d'un groupe de Lie, une variété avec unestructure de groupe - ou comme les classes d'équivalence de leur configuration3D sous l'action de transformations - i.e. comme éléments d'un quotient, unevariété avec une stratification. Les images médicales peuvent êtrereprésentées par des variétés avec une distribution horizontale. Lacontribution de cette thèse est d'étendre les statistiques géométriques au delàdes géométries riemanniennes ou métriques maintenant classiques pourprendre en compte des structures additionnelles. Premièrement, nousdéfinissons les statistiques géométriques sur les groupes de Lie. Nousproposons une construction algorithmique de (pseudo-)métriqueRiemannienne, compatible avec la structure de groupe, lorsqu'elle existe. Noustrouvons que certains groupes n'admettent pas de telle (pseudo-)métrique etdéfendons l'idée de statistiques non-métriques sur les groupes de Lie. Ensuite,nous utilisons les statistiques géométriques pour analyser l'algorithme decalcul d'organe de référence, reformulé avec des espaces quotient. Nousmontrons son biais et suggérons un algorithme amélioré. Enfin, nousappliquons les statistiques géométriques au traitement d'images, engénéralisant les structures sous-Riemanniennes, utilisées en 2D, au 3D / This thesis develops Geometric Statistics to analyze the normal andpathological variability of organ shapes in Computational Anatomy. Geometricstatistics consider data that belong to manifolds with additional geometricstructures. In Computational Anatomy, organ shapes may be modeled asdeformations of a template - i.e. as elements of a Lie group, a manifold with agroup structure - or as the equivalence classes of their 3D configurations underthe action of transformations - i.e. as elements of a quotient space, a manifoldwith a stratification. Medical images can be modeled as manifolds with ahorizontal distribution. The contribution of this thesis is to extend GeometricStatistics beyond the now classical Riemannian and metric geometries in orderto account for these additional structures. First, we tackle the definition ofGeometric Statistics on Lie groups. We provide an algorithm that constructs a(pseudo-)Riemannian metric compatible with the group structure when itexists. We find that some groups do not admit such a (pseudo-)metric andadvocate for non-metric statistics on Lie groups. Second, we use GeometricStatistics to analyze the algorithm of organ template computation. We show itsasymptotic bias by considering the geometry of quotient spaces. We illustratethe bias on brain templates and suggest an improved algorithm. We then showthat registering organ shapes induces a bias in their statistical analysis, whichwe offer to correct. Third, we apply Geometric Statistics to medical imageprocessing, providing the mathematics to extend sub-Riemannian structures,already used in 2D, to our 3D images
23

Optimization tools for non-asymptotic statistics in exponential families

Le Priol, Rémi 04 1900 (has links)
Les familles exponentielles sont une classe de modèles omniprésente en statistique. D'une part, elle peut modéliser n'importe quel type de données. En fait la plupart des distributions communes en font partie : Gaussiennes, variables catégoriques, Poisson, Gamma, Wishart, Dirichlet. D'autre part elle est à la base des modèles linéaires généralisés (GLM), une classe de modèles fondamentale en apprentissage automatique. Enfin les mathématiques qui les sous-tendent sont souvent magnifiques, grâce à leur lien avec la dualité convexe et la transformée de Laplace. L'auteur de cette thèse a fréquemment été motivé par cette beauté. Dans cette thèse, nous faisons trois contributions à l'intersection de l'optimisation et des statistiques, qui tournent toutes autour de la famille exponentielle. La première contribution adapte et améliore un algorithme d'optimisation à variance réduite appelé ascension des coordonnées duales stochastique (SDCA), pour entraîner une classe particulière de GLM appelée champ aléatoire conditionnel (CRF). Les CRF sont un des piliers de la prédiction structurée. Les CRF étaient connus pour être difficiles à entraîner jusqu'à la découverte des technique d'optimisation à variance réduite. Notre version améliorée de SDCA obtient des performances favorables comparées à l'état de l'art antérieur et actuel. La deuxième contribution s'intéresse à la découverte causale. Les familles exponentielles sont fréquemment utilisées dans les modèles graphiques, et en particulier dans les modèles graphique causaux. Cette contribution mène l'enquête sur une conjecture spécifique qui a attiré l'attention dans de précédents travaux : les modèles causaux s'adaptent plus rapidement aux perturbations de l'environnement. Nos résultats, obtenus à partir de théorèmes d'optimisation, soutiennent cette hypothèse sous certaines conditions. Mais sous d'autre conditions, nos résultats contredisent cette hypothèse. Cela appelle à une précision de cette hypothèse, ou à une sophistication de notre notion de modèle causal. La troisième contribution s'intéresse à une propriété fondamentale des familles exponentielles. L'une des propriétés les plus séduisantes des familles exponentielles est la forme close de l'estimateur du maximum de vraisemblance (MLE), ou maximum a posteriori (MAP) pour un choix naturel de prior conjugué. Ces deux estimateurs sont utilisés presque partout, souvent sans même y penser. (Combien de fois calcule-t-on une moyenne et une variance pour des données en cloche sans penser au modèle Gaussien sous-jacent ?) Pourtant la littérature actuelle manque de résultats sur la convergence de ces modèles pour des tailles d'échantillons finis, lorsque l'on mesure la qualité de ces modèles avec la divergence de Kullback-Leibler (KL). Pourtant cette divergence est la mesure de différence standard en théorie de l'information. En établissant un parallèle avec l'optimisation, nous faisons quelques pas vers un tel résultat, et nous relevons quelques directions pouvant mener à des progrès, tant en statistiques qu'en optimisation. Ces trois contributions mettent des outil d'optimisation au service des statistiques dans les familles exponentielles : améliorer la vitesse d'apprentissage de GLM de prédiction structurée, caractériser la vitesse d'adaptation de modèles causaux, estimer la vitesse d'apprentissage de modèles omniprésents. En traçant des ponts entre statistiques et optimisation, cette thèse fait progresser notre maîtrise de méthodes fondamentales d'apprentissage automatique. / Exponential families are a ubiquitous class of models in statistics. On the one hand, they can model any data type. Actually, the most common distributions are exponential families: Gaussians, categorical, Poisson, Gamma, Wishart, or Dirichlet. On the other hand, they sit at the core of generalized linear models (GLM), a foundational class of models in machine learning. They are also supported by beautiful mathematics thanks to their connection with convex duality and the Laplace transform. This beauty is definitely responsible for the existence of this thesis. In this manuscript, we make three contributions at the intersection of optimization and statistics, all revolving around exponential families. The first contribution adapts and improves a variance reduction optimization algorithm called stochastic dual coordinate ascent (SDCA) to train a particular class of GLM called conditional random fields (CRF). CRF are one of the cornerstones of structured prediction. CRF were notoriously hard to train until the advent of variance reduction techniques, and our improved version of SDCA performs favorably compared to the previous state-of-the-art. The second contribution focuses on causal discovery. Exponential families are widely used in graphical models, and in particular in causal graphical models. This contribution investigates a specific conjecture that gained some traction in previous work: causal models adapt faster to perturbations of the environment. Using results from optimization, we find strong support for this assumption when the perturbation is coming from an intervention on a cause, and support against this assumption when perturbation is coming from an intervention on an effect. These pieces of evidence are calling for a refinement of the conjecture. The third contribution addresses a fundamental property of exponential families. One of the most appealing properties of exponential families is its closed-form maximum likelihood estimate (MLE) and maximum a posteriori (MAP) for a natural choice of conjugate prior. These two estimators are used almost everywhere, often unknowingly -- how often are mean and variance computed for bell-shaped data without thinking about the Gaussian model they underly? Nevertheless, literature to date lacks results on the finite sample convergence property of the information (Kulback-Leibler) divergence between these estimators and the true distribution. Drawing on a parallel with optimization, we take some steps towards such a result, and we highlight directions for progress both in statistics and optimization. These three contributions are all using tools from optimization at the service of statistics in exponential families: improving upon an algorithm to learn GLM, characterizing the adaptation speed of causal models, and estimating the learning speed of ubiquitous models. By tying together optimization and statistics, this thesis is taking a step towards a better understanding of the fundamentals of machine learning.
24

Conditional Noise-Contrastive Estimation : With Application to Natural Image Statistics / Uppskattning via betingat kontrastivt brus

Ceylan, Ciwan January 2017 (has links)
Unnormalised parametric models are an important class of probabilistic models which are difficult to estimate. The models are important since they occur in many different areas of application, e.g. in modelling of natural images, natural language and associative memory. However, standard maximum likelihood estimation is not applicable to unnormalised models, so alternative methods are required. Noise-contrastive estimation (NCE) has been proposed as an effective estimation method for unnormalised models. The basic idea is to transform the unsupervised estimation problem into a supervised classification problem. The parameters of the unnormalised model are learned by training the model to differentiate the given data samples from generated noise samples. However, the choice of the noise distribution has been left open to the user, and as the performance of the estimation may be sensitive to this choice, it is desirable for it to be automated. In this thesis, the ambiguity in the choice of the noise distribution is addressed by presenting the previously unpublished conditional noise-contrastive estimation (CNCE) method. Like NCE, CNCE estimates unnormalised models by classifying data and noise samples. However, the choice of noise distribution is partly automated via the use of a conditional noise distribution that is dependent on the data. In addition to introducing the core theory for CNCE, the method is empirically validated on data and models where the ground truth is known. Furthermore, CNCE is applied to natural image data to show its applicability in a realistic application. / Icke-normaliserade parametriska modeller utgör en viktig klass av svåruppskattade statistiska modeller. Dessa modeller är viktiga eftersom de uppträder inom många olika tillämpningsområden, t.ex. vid modellering av bilder, tal och skrift och associativt minne. Dessa modeller är svåruppskattade eftersom den vanliga maximum likelihood-metoden inte är tillämpbar på icke-normaliserade modeller. Noise-contrastive estimation (NCE) har föreslagits som en effektiv metod för uppskattning av icke-normaliserade modeller. Grundidén är att transformera det icke-handledda uppskattningsproblemet till ett handlett klassificeringsproblem. Den icke-normaliserade modellens parametrar blir inlärda genom att träna modellen på att skilja det givna dataprovet från ett genererat brusprov. Dock har valet av brusdistribution lämnats öppet för användaren. Eftersom uppskattningens prestanda är känslig gentemot det här valet är det önskvärt att få det automatiserat. I det här examensarbetet behandlas valet av brusdistribution genom att presentera den tidigare opublicerade metoden conditional noise-contrastive estimation (CNCE). Liksom NCE uppskattar CNCE icke-normaliserade modeller via klassificering av data- och brusprov. I det här fallet är emellertid brusdistributionen delvis automatiserad genom att använda en betingad brusdistribution som är beroende på dataprovet. Förutom att introducera kärnteorin för CNCE valideras även metoden med hjälp av data och modeller vars genererande parametrar är kända. Vidare appliceras CNCE på bilddata för att demonstrera dess tillämpbarhet.
25

Small Area Estimation in a Survey of Governments

Dumbacher, Brian Arthur 01 April 2016 (has links)
<p> Small area composite estimators are weighted averages that attempt to balance the variability of the direct survey estimator against the bias of the synthetic estimator. Direct and synthetic estimators have competing properties, and finding an optimal weighted average can be challenging. </p><p> One example of a survey that utilizes small area estimation is the Annual Survey of Public Employment &amp; Payroll (ASPEP), which is conducted by the U.S. Census Bureau to collect data on the number and pay of federal, state, and local government civilian employees. Estimates of local government totals are calculated for domains created by crossing state and government function. To calculate estimates at such a detailed level, the Census Bureau uses small area methods that take advantage of auxiliary information from the most recent Census of Governments (CoG). During ASPEP's 2009 sample design, a composite estimator was used, and it was observed that the direct estimator has the desirable property of being greater than the corresponding raw sum of the data, whereas the synthetic estimator has the desirable property of being close to the most recent CoG total. </p><p> In this research, the design-based properties of various estimators and quantities in the composite methodology are studied via a large Monte Carlo simulation using CoG data. New estimators are constructed based on the underlying ideas of limited translation and James-Stein shrinkage. The simulation provides estimates of the design-based variance and mean squared error of every estimator under consideration, and more optimal domain-level composite weights are calculated. Based on simulation results, several limitations of the composite methodology are identified. </p><p> Explicit area-level models are developed that try to capture the spirit of the composite methodology and address its limitations in a unified and generalizable way. The models consist of hierarchical Bayesian extensions of the Fay-Herriot model and are characterized by novel combinations of components allowing for correlated sampling errors, multilevel structure, and t-distributed errors. Estimated variances and covariances from the Monte Carlo simulation are incorporated to take ASPEP's complex sample design into account. Posterior predictive checks and cross-validated posterior predictive checks based on selective discrepancy measures are used to help assess model fit. </p><p> It is observed that the t-normal models, which have t-distributed sampling errors, protect against unreasonable direct estimates and provide over-shrinkage towards the regression synthetic estimates. Also, the proportion of model estimates less than the corresponding raw sums is close to optimal. These empirical findings motivate a theoretical study of the shrinkage provided by the t-normal model. Another simulation is conducted to compare the shrinkage properties of this model and the Fay-Herriot model. </p><p> The methods in this research apply not just to ASPEP, but also to other surveys of governments, surveys of business establishments, and surveys of agriculture, which are similar in terms of sample design and the availability of auxiliary data from a quinquennial census. Ideas for future research include investigating alternative data transformations and discrepancy measures and developing hierarchical Bayesian models for time series and count data.</p>
26

On two-color monotonic self-equilibrium urn models

Gao, Shuyang 08 June 2016 (has links)
<p> In this study, we focus on a class of two-color balanced urns with multiple drawings that has the property of monotonic self-equilibrium. We give the definition of a monotonic self-equilibrium urn model by specifying the form of its replacement matrix. At each step, a sample of size m &ge; 1 is drawn from the urn, and the replacement rule prespecified by a matrix is applied. The idea is to support whichever color that has fewer counts in the sample. Intuitively, for any urn scheme within this class, the proportions of white and blue balls in the urn tend to be equal asymptotically. We observe by simulation that, when n is large, the number of white balls in the urn within this class is around half of the total number of balls in the urn on average and is normally distributed. Within the class of affine urn schemes, we specify subclasses that have the property of monotonic self-equilibrium, and derive limiting behavior of the number of white balls using existing results. The class of non-affine urn schemes is not yet well developed in the literature. We work on a subclass of non-affine urn models that has the property of monotonic self-equilibrium. For the special case that one ball is added into the urn at each step, we derive limiting behavior of the expectation and the variance and prove convergence in probability for the proportion of white balls in the urn. An optimal strategy on urn balancing and application of monotonic self-equilibrium urn models are also briefly discussed.</p>
27

On the sequential test per MIL-STD-781 and new, more efficient test plans.

Li, Dingjun. January 1990 (has links)
The sequential probability ratio test is an efficient test procedure compared to the fixed sample size test procedure in the sense that it minimizes the average sample size needed for terminating the experiment at the two specified hypotheses, i.e., at H₀: θ = θ₀ and H₁: θ = θ₁. However, this optimum property does not hold for the values of the testing parameter other than these two hypotheses, especially for those with values between these two. Also the estimation following a sequential test is considered to be difficult, and the usual maximum likelihood estimate is in general biased. The sequential test plans given in MIL-STD-781 do not meet their nominal test risk requirements and the truncation of these test plans is determined by the theory for a fixed sample size test. The contributions of this dissertation are: (1) The distribution of the successive sums of samples from a generalized sequential probability ratio test in the exponential case has been obtained. An exact analysis method for the generalized sequential probability ratio test has been developed as well as its FORTRAN programs based on this distribution. (2) A set of improved sequential probability ratio test plans for testing the mean for the exponential distribution has been established. The improved test plan can meet the test risk requirements exactly and can approximately minimize the maximum average waiting time. (3) The properties of the estimates after a sequential test have been investigated and a bias reduced estimate has been recommended. The general method for constructing the confidence interval after a sequential test has been studied and its existence and uniqueness have been proved in the exponential case. (4) Two modification to the Wald's sequential probability ratio test, the triangular test and the repeated significance test, in the exponential case have been also studied. The results show that the triangular test is very close to the optimal test in terms of minimizing the maximum average sample size, and a method for constructing the triangular test plan has been developed.
28

Joint Modelling of Longitudinal Quality of Life Measurements and Survival Data in Cancer Clinical Trials

Song, Hui 23 January 2013 (has links)
In cancer clinical trials, longitudinal Quality of Life (QoL) measurements on a patient may be analyzed by classical linear mixed models but some patients may drop out of study due to recurrence or death, which causes problems in the application of classical methods. Joint modelling of longitudinal QoL measurements and survival times may be employed to explain the dropout information of longitudinal QoL measurements, and provide more efficient estimation, especially when there is strong association between longitudinal measurements and survival times. Most joint models in the literature assumed classical linear mixed model for longitudinal measurements, and Cox's proportional hazards model for survival times. The linear mixed model with normal-distribution random effects may not be sufficient to model longitudinal QoL measurements. Moreover, with advances in medical research, long-term survivors may exist, which makes the proportional hazards assumption not suitable for survival times when some censoring times are due to potential cured patients. In this thesis, we propose new models to analyze longitudinal QoL measurements and survival times jointly. In the first part of this thesis, we develop a joint model which assumes a linear mixed tt model for longitudinal measurements and a promotion time cure model for survival data. We link these two models through a latent variable and develop a semiparametric inference procedure. The second part of this thesis considers a special feature of the QoL measurements. That is, they are constrained in an interval (0,1). We propose to take into account this feature by a simplex-distribution model for these QoL measurements. Classical proportional hazards and promotion time cure models are used separately to the situations, depending on whether a cure fraction is assumed in the data or not. In both cases, we characterize the correlation between the longitudinal measurements and survival times by a shared random effect, and derive a semiparametric penalized joint partial likelihood to estimate the parameters. The above proposed new joint models and estimation procedures are evaluated in simulation studies and applied to the QoL measurements and recurrence times from a clinical trial on women with early breast cancer. / Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2013-01-23 14:04:14.297
29

An Introduction to the Cox Proportional Hazards Model and Its Applications to Survival Analysis

Thompson, Kristina 29 January 2015 (has links)
<p> Statistical modeling of lifetime data, or survival analysis, is studied in many fields, including medicine, information technology and economics. This type of data gives the time to a certain event, such as death in studies of cancer treatment, or time until a computer program crashes. Researchers are often interested in how covariates affect the time to event and wish to determine ways of incorporating such covariates into statistical models. Covariates are explanatory variables that are suspected to affect the lifetime of interest. Lifetime data are typically subject to censoring and this fact needs to be taken into account when choosing the statistical model. </p><p> D.R. Cox (1972) proposed a statistical model that can be used to explore the relationship between survival and various covariates and takes censoring into account. This is called the Cox proportional hazards (PH) model. In particular, the model will be presented and estimation procedures for parameters and functions of interest will be developed. Statistical properties of the resulting estimators will be derived and used in developing inference procedures. </p>
30

A Clinical Decision Support System for the Prevention of Genetic-Related Heart Disease

Saguilig, Lauren G. 13 June 2017 (has links)
<p> Drug-induced long QT syndrome (diLQTS) is a common adverse drug reaction characterized by rapid and erratic heart beats that may instigate fainting or seizures. The onset of diLQTS can lead to torsades de points (TdP), a specific form of abnormal heart rhythm that often leads to sudden cardiac arrest and death. This study aims to understand the genetic similarities between diLQTS and TdP to develop a clinical decision support system (CDSS) to aide physicians in the prevention of TdP. Highly accurate classification algorithms, including random forests, shrunken centroid, and diagonal linear discriminant analysis are considered to build a prediction model for TdP. With a feasible set of markers, we accurately predict TdP classifications with an accuracy above 90%. The methodology used in this study can be extended to dealing with other biomedical high-dimensional data.</p>

Page generated in 0.6043 seconds