111 |
En pärla gör ingen kvinna? : En statistisk jämförelse mellan osteologisk bedömda gravar och dess gravgåvor under yngre järnåldernLagerholm, Eva January 2009 (has links)
I have statistically worked up a material from 228 graves from the late Iron Age in the area of Mälardalen. In my material I have gathered the incidence of combs, knifes, beads, weapons whetstones and torshammarsrings. I have found that beads are overrepresented in graves of women and whetstones in graves of men. I only found weapons in graves from male. I found no indication from my statistic hypothesis (Z-test) that a grave that contains more than three beads should define the grave of a woman. A grave that contains a lot of beads, more than 20, consider I as a female gender. Combs, knifes and torshammarsring are considered as gender neutral.
|
112 |
Adaptation des méthodes d’apprentissage aux U-statistiques / Adapting machine learning methods to U-statisticsColin, Igor 24 November 2016 (has links)
L’explosion récente des volumes de données disponibles a fait de la complexité algorithmique un élément central des méthodes d’apprentissage automatique. Les algorithmes d’optimisation stochastique ainsi que les méthodes distribuées et décentralisées ont été largement développés durant les dix dernières années. Ces méthodes ont permis de faciliter le passage à l’échelle pour optimiser des risques empiriques dont la formulation est séparable en les observations associées. Pourtant, dans de nombreux problèmes d’apprentissage statistique, l’estimation précise du risque s’effectue à l’aide de U-statistiques, des fonctions des données prenant la forme de moyennes sur des d-uplets. Nous nous intéressons tout d’abord au problème de l’échantillonnage pour la minimisation du risque empirique. Nous montrons que le risque peut être remplacé par un estimateur de Monte-Carlo, intitulé U-statistique incomplète, basé sur seulement O(n) termes et permettant de conserver un taux d’apprentissage du même ordre. Nous établissons des bornes sur l’erreur d’approximation du U-processus et les simulations numériques mettent en évidence l’avantage d’une telle technique d’échantillonnage. Nous portons par la suite notre attention sur l’estimation décentralisée, où les observations sont désormais distribuées sur un réseau connexe. Nous élaborons des algorithmes dits gossip, dans des cadres synchrones et asynchrones, qui diffusent les observations tout en maintenant des estimateurs locaux de la U-statistique à estimer. Nous démontrons la convergence de ces algorithmes avec des dépendances explicites en les données et la topologie du réseau. Enfin, nous traitons de l’optimisation décentralisée de fonctions dépendant de paires d’observations. De même que pour l’estimation, nos méthodes sont basées sur la concomitance de la propagation des observations et l’optimisation local du risque. Notre analyse théorique souligne que ces méthodes conservent une vitesse de convergence du même ordre que dans le cas centralisé. Les expériences numériques confirment l’intérêt pratique de notre approche. / With the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach.
|
113 |
Model for Bathtub-Shaped Hazard Rate: Monte Carlo StudyLeithead, Glen S. 01 May 1970 (has links)
A new model developed for the entire bathtub-shaped hazard rate curve has been evaluated as to its usefulness as a method of reliability estimation. The model is of the form:
F(t) = 1 - exp - (ϴ1tL + ϴ2t + ϴ3tM)
where "L" and "M" were assumed known.
The estimate of reliability obtained from the new model was compared with the traditional restricted sample estimate for four different time intervals and was found to have less bias and variance for all time points.
The was a monte carlo study and the data generated showed that the new model has much potential as a method for estimating reliability. (51 pages)
|
114 |
Statistical Modeling, Exploration, and Visualization of Snow Water Equivalent DataOdei, James Beguah 01 May 2014 (has links)
Due to a continual increase in the demand for water as well as an ongoing regional drought, there is an imminent need to monitor and forecast water resources in the Western United States. In particular, water resources in the IntermountainWest rely heavily on snow water storage. Thus, the need to improve seasonal forecasts of snowpack and considering new techniques would allow water resources to be more effectively managed throughout the entire water-year. Many available models used in forecasting snow water equivalent (SWE) measurements require delicate calibrations.
In contrast to the physical SWE models most commonly used for forecasting, we offer a statistical model. We present a data-based statistical model that characterizes seasonal snow water equivalent in terms of a nested time-series, with the large scale focusing on the inter-annual periodicity of dominant signals and the small scale accommodating seasonal noise and autocorrelation. This model provides a framework for independently estimating the temporal dynamics of SWE for the various snow telemetry (SNOTEL) sites. We use SNOTEL data from ten stations in Utah over 34 water-years to implement and validate this model.
This dissertation has three main goals: (i) developing a new statistical model to forecast SWE; (ii) bridging existing R packages into a new R package to visualize and explore spatial and spatio-temporal SWE data; and (iii) applying the newly developed R package to SWE data from Utah SNOTEL sites and the Upper Sheep Creek site in Idaho as case studies.
|
115 |
A Search for Low-Amplitude Variability Among Population I Main Sequence StarsRose, Michael Benjamin 06 July 2006 (has links) (PDF)
The detection of variable stars in open clusters is an essential component of testing stellar structure and evolution theories. The ability to detect low-amplitude variability among cluster members is directly related to the quality of the photometric results. Point Spread Function (PSF) fitting is the best method available for measuring accurate magnitudes within crowded fields of stars, while high-precision differential photometry is the preferred technique for removing the effects of atmospheric extinction and variable seeing. In the search for new variable stars among hundreds or thousands of stars, the Robust Median Statistic (RoMS) is proven more effective for finding low-amplitude variables than the traditional error curve approach. A reputable computer program called DAOPHOT was used to perform PSF fitting, whereas programs, CLUSTER and RoMS, were created to carry out high-precision differential photometry and calculate the RoMS, respectively, on the open clusters NGC 225, NGC 559, NGC 6811, NGC 6940, NGC 7142, and NGC 7160. Twenty-two new variables and eighty-seven suspected variable stars were discovered, and time-series data of the new variables are presented.
|
116 |
The Energy Goodness-of-fit Test for Univariate Stable DistributionsYang, Guangyuan 26 July 2012 (has links)
No description available.
|
117 |
A New Approach to ANOVA Methods for Autocorrelated DataLiu, Gang January 2016 (has links)
No description available.
|
118 |
The measurement and decomposition of achievement equity - an introduction to its concepts and methods including a multiyear empirical study of sixth grade reading scoresRogers, Francis H., III 29 September 2004 (has links)
No description available.
|
119 |
PokerbotenNilsson, Marcus, Borgström, Stefan January 2011 (has links)
Syftet med följande examensarbete är att undersöka teorier och få fram idéer om hur man kan bygga en bot som spelar poker. Ett viktigt ämne som studeras är artificiell intelligens och hur ett AI kan utvecklas hos en bot som ska ersätta en mänsklig pokerspelare spelandes i ett nätverk. Studien ger en inblick om spelregler för Texas Hold’em och går även in på teori om betydelsefull statistik, sannolikhet och odds. Resultatet av denna undersökning består av framtagna algoritmer som kan användas vid utveckling av en bot som spelar poker på ett bord med tio spelare. / The aim of the following thesis is to explore and develop ideas on how to build a bot that plays poker. An important topic that is studied is artificial intelligence and how an AI is implemented for a bot that replaces a human poker player playing in a network.The study provides insight of the playing rules for Texas Hold’em and theory of meaningful statistics, probability and odds will be used.The results of this study consist of algorithms that can be used in the development of a bot that plays poker on a table with ten players.
|
120 |
Improving the accuracy of statistics used in de-identification and model validation (via the concordance statistic) pertaining to time-to-event dataCaetano, Samantha-Jo January 2020 (has links)
Time-to-event data is very common in medical research. Thus, clinicians and patients need analysis of this data to be accurate, as it is often used to interpret disease screening results, inform treatment decisions, and identify at-risk patient groups (ie, sex, race, gene expressions, etc.). This thesis tackles three statistical issues pertaining to time-to-event data.
The first issue was incurred from an Institute for Clinical and Evaluative Sciences lung cancer registry data set, which was de-identified by censoring patients at an earlier date. This resulted in an underestimate of the observed times of censored patients. Five methods were proposed to account for the underestimation incurred by de-identification. A subsequent simulation study was conducted to compare the effectiveness of each method in reducing bias, and mean squared error as well as improving coverage probabilities of four different KM estimates. The simulation results demonstrated that situations with relatively large numbers of censored patients required methodology with larger perturbation. In these scenarios, the fourth proposed method (which perturbed censored times such that they were censored in the final year of study) yielded estimates with the smallest bias, mean squared error, and largest coverage probability. Alternatively, when there were smaller numbers of censored patients, any manipulation to the altered data set worsened the accuracy of the estimates.
The second issue arises when investigating model validation via the concordance (c) statistic. Specifically, the c-statistic is intended for measuring the accuracy of statistical models which assess the risk associated with a binary outcome. The c-statistic estimates the proportion of patient pairs where the patient with a higher predicted risk had experienced the event. The definition of a c-statistic cannot be uniquely extended to time-to-event outcomes, thus many proposals have been made. The second project developed a parametric c-statistic which assumes to the true survival times are exponentially distributed to invoke the memoryless property. A simulation study was conducted which included a comparative analysis of two other time-to-event c-statistics. Three different definitions of concordance in the time-to-event setting were compared, as were three different c-statistics. The c-statistic developed by the authors yielded the smallest bias when censoring is present in data, even when the exponentially distributed parametric assumptions do not hold. The c-statistic developed by the authors appears to be the most robust to censored data. Thus, it is recommended to use this c-statistic to validate prediction models applied to censored data.
The third project in this thesis developed and assessed the appropriateness of an empirical time-to-event c-statistic that is derived by estimating the survival times of censored patients via the EM algorithm. A simulation study was conducted for various sample sizes, censoring levels and correlation rates. A non-parametric bootstrap was employed and the mean and standard error of the bias of 4 different time-to-event c-statistics were compared, including the empirical EM c-statistic developed by the authors. The newly developed c-statistic yielded the smallest mean bias and standard error in all simulated scenarios. The c-statistic developed by the authors appears to be the most appropriate when estimating concordance of a time-to-event model. Thus, it is recommended to use this c-statistic to validate prediction models applied to censored data. / Thesis / Doctor of Philosophy (PhD)
|
Page generated in 0.0925 seconds