Global ETD Search

91	Analys av hörnsekvenser i svensk elitfotboll : Gruppering av hörnsekvenser och utvärdering av sannolikhet för skott med logistisk hierarkisk modellstruktur / Analysis of corner sequences in the top Swedish football leagues : Clustering of corner sequences and evaluation of the probability of shot with logistical hierarchical model structure Rydström, Sidney, Lindén, Jakob January 2020 (has links) Sportanalys definieras av Alamar (2013) som användning av historisk data för att applicera modeller som kan ge information till beslutstagare inom en viss organisation. Det ger dem möjlighet att assistera sin organisation för att få en sportslig fördel. I den här studien utförs sportanalys, mer specifikt analyseras hörnsekvenser inom svensk elitfotboll. En hörnsekvens är den sekvens av händelser som sker från att bollen sätts i spel från hörnans startposition tills det att något av följande villkor uppfylls: 8 händelser sker givet att hörnan slås kort 6 händelser sker givet att hörnan slås långt 15 sekunder passerar Försvarande lag tar över bollen Något lag utför ett regelbrott Skott utförs av attackerande lag Datamaterialet som används är framtaget av företaget Wyscout och tillhandahållet av Football Analytics Sweden AB. De ligor och säsonger som betraktas är de svenska herrligorna Allsvenskan och Superettan för säsongerna 2017, 2018 och 2019. I datamaterialet erhålls information om varje händelse som sker under matchen. Utifrån information om händelsen samt koordinater om var händelsen sker framställs variabler som ska kunna beskriva vad som sker inom en hörnsekvens. Syftet med studien är att först identifiera hörnsekvenser med liknande egenskaper och gruppera dem. Utifrån gruppindelningen undersöks sedan sannolikheten för att en hörnsekvens leder till skott samt vad som påverkar sannolikheten. Algoritmen Partitioning Around Medoids (PAM) används med avståndsmåttet Gower och utvärderingsmåttet silhouette för att identifiera följande fem hörnsekvenstyper: Utåtskruvade hörnor från vänsterhörn med tendens mot främre stolpen och relativt nära mållinjen. Inåtskruvade hörnor från vänsterhörn med tendens mot främre stolpen och längre förflyttning ut från mållinjen. Utåtskruvade hörnor från högerhörn med tendens mot främre stolpen och relativt nära mållinjen. Korta varianter som har längre varaktighet, innefattar fler händelser och involverar fler spelare. Inåtskruvade hörnor från högerhörn med tendens mot främre stolpen och längre förflyttning ut från mållinjen. Betraktas förekomsten av skott i datamaterialet givet klusterstrukturen konstateras att hörnsekvenstyp 4 i störst utsträckning lett till skott med förekomsten 19 procent inom klustret. Hörnsekvenstyperna 2 och 5 är något sämre med respektive 18 procent av hörnsekvenserna som lett till skott. Med dessa hörnsekvenstyper i fokus anpassas flera Bayesianska hierarkiska logitmodeller för att undersöka sannolikheten för att en hörnsekvens leder till skott givet de framtagna variablerna. Vid skapandet av modellerna undersöktes om en hierarkisk modellstruktur var behövlig för att undersöka sannolikheten för skott. Slutsatsen blev att det är väsentligt att tillämpa en hierarkisk modellstruktur. Av vald modell så dras slutsatsen att det som påverkar sannolikheten att komma till skott allra mest, med avseende de variabler som undersökts, är antalet händelser som sker i hörnsekvensen. Den hörnsekvenstyp som påverkas mest av antalet händelser är den korta varianten. Det diskuteras om hur det kan vara problematiskt att undersöka den linjära påverkan på log-oddset. Detta eftersom påverkan på sannolikheten för skott inte är densamma för en ökning mellan en och två händelser som mellan tre och fyra händelser. Det är även näst intill omöjligt att komma till skott på första händelsen i hörnsekvensen då händelsen utgörs av att hörnan slås. / Sports analysis is defined by Alamar (2013) as the management of structured historical data, the application of analytical models that utilize that data, and the use of information systems to inform decision makers and enable them to help their organization in gaining a competitive advantage on the field of play. This study focuses on sports analysis, more specifically corner sequences in Swedish elite football. A corner sequence is defined as the sequence of events that occur after the ball have been put into play from the corners start position up until that one of the following conditions are met: 8 events occur given a short corner is played 6 events occur given a long corner is played 15 seconds passes The defending team overtake the ball Some team performs a foul The attacking team performs a shot The data set used comes from Wyscout and is provided by Football Analytics Sweden AB. The data consist of games from the top Swedish football leagues for men: Allsvenskan and Superettan, and consists of games played in the seasons 2017, 2018 and 2019. In the data, information about every event that occur during the game is provided, where all events are classified to provide information about what happens at the specific event. The information about each event and its coordinates is then used produce variables to describe what occurs during a corner sequence. The purpose is to identify corner sequences with similar characteristics and group them together. Then use these groups to examine the probability that a corner sequence leads to a shot, and what influences this probability. The clustering algorithm Partitioning Around Medoids (PAM) is used with Gower as the dissimilarity measure and silhouette to evaluate the clusters, then the five following clusters are identified: Corners curled away from goal from the left corner with a tendency towards the front post and relatively close to the goal line. Corners curled towards goal from the left corner with a tendency towards the front post further away from the goal line. Corner curled away from goal from the right corner with a tendency towards the goal line. Short corner variant with longer duration, more events occurring and more players involved. Corners curled towards goal from the right corner with a tendency towards the front post and further away from the goal line. Given the clustering structure it is noted that the corner sequence of type 4 has led to the greatest extent of shots with the proportion of 19 percent within the cluster. The corner sequences of type 2 and 5 have a slightly lower shot occurrence with 18 percent per corner seqence type. With these corner sequence types in focus, several Hierarchical Bayesian Logistic Regression models are fitted to analyze the probability that a corner sequence leads to a shot given the produced explanatory variables. When fitting the models it is examined if it is necessary to apply a hierarchichal strutcture to the model. The conclusion is drawn that the hierarchical model structure is crucial to the model's performance. The conclusion is drawn from the final model that the explanatory variable which explains the probability to shoot best is the number of events that occur during the corner sequence. The corner sequence type that is most influenced by the number of events that occur during the corner sequence is the short corner variant. In the study it is discussed if there is an issue to suppose that this variable has a linear effect on the log-odds, since the impact on the probability to shoot is not the same for an increase between one and two as three and four events. Furthermore it is near impossible to shoot in the first event that occurs in the corner sequence. Bayesian Statistics cluster analysis logistical hierarchical model structure sports analysis soccer football shot corner Bayesiansk statistik klusteranalys logistisk hierarkisk modellstruktur sportanalys fotboll skott hörnor Allsvenskan Superettan Probability Theory and Statistics Sannolikhetsteori och statistik
92	Hydroacoustic Quantification of Lake Erie Walleye (Sander vitreus)Distribution and Abundance DuFour, Mark R. 18 October 2017 (has links) No description available. Aquatic Sciences Biology Ecology Environmental Science Freshwater Ecology Natural Resource Management Biostatistics
93	Multiscale and meta-analytic approaches to inference in clinical healthcare data Hamilton, Erin Kinzel 29 March 2013 (has links) The field of medicine is regularly faced with the challenge of utilizing information that is complicated or difficult to characterize. Physicians often must use their best judgment in reaching decisions or recommendations for treatment in the clinical setting. The goal of this thesis is to use innovative statistical tools in tackling three specific challenges of this nature from current healthcare applications. The first aim focuses on developing a novel approach to meta-analysis when combining binary data from multiple studies of paired design, particularly in cases of high heterogeneity between studies. The challenge is in properly accounting for heterogeneity when dealing with a low or moderate number of studies, and with a rarely occurring outcome. The proposed approach uses a Rasch model for translating data from multiple paired studies into a unified structure that allows for properly handling variability associated with both pair effects and study effects. Analysis is then performed using a Bayesian hierarchical structure, which accounts for heterogeneity in a direct way within the variances of the separate generating distributions for each model parameter. This approach is applied to the debated topic within the dental community of the comparative effectiveness of materials used for pit-and-fissure sealants. The second and third aims of this research both have applications in early detection of breast cancer. The interpretation of a mammogram is often difficult since signs of early disease are often minuscule, and the appearance of even normal tissue can be highly variable and complex. Physicians often have to consider many important pieces of the whole picture when trying to assess next steps. The final two aims focus on improving the interpretation of findings in mammograms to aid in early cancer detection. When dealing with high frequency and irregular data, as is seen in most medical images, the behaviors of these complex structures are often difficult or impossible to quantify by standard modeling techniques. But a commonly occurring phenomenon in high-frequency data is that of regular scaling. The second aim in this thesis is to develop and evaluate a wavelet-based scaling estimator that reduces the information in a mammogram down to an informative and low-dimensional quantification of the innate scaling behavior, optimized for use in classifying the tissue as cancerous or non-cancerous. The specific demands for this estimator are that it be robust with respect to distributional assumptions on the data, and with respect to outlier levels in the frequency domain representation of the data. The final aim in this research focuses on enhancing the visualization of microcalcifications that are too small to capture well on screening mammograms. Using scale-mixing discrete wavelet transform methods, the existing detail information contained in a very small and course image will be used to impute scaled details at finer levels. These "informed" finer details will then be used to produce an image of much higher resolution than the original, improving the visualization of the object. The goal is to also produce a confidence area for the true location of the shape's borders, allowing for more accurate feature assessment. Through the more accurate assessment of these very small shapes, physicians may be more confident in deciding next steps. Rasch model Bayesian hierarchical model Paired data Meta-analysis Heterogeneity Scale-mixing wavelet transform Sampling distribution Bootstrapping Dental sealants Waveletes Spectral tools Breast cancer Scaling Wavelet spectra Weighted regression Theil Microcalcification Diagnostic classification Image enhancement Rasch models Item response theory Sampling (Statistics) Bootstrap (Statistics)
94	Classification de données multivariées multitypes basée sur des modèles de mélange : application à l'étude d'assemblages d'espèces en écologie / Model-based clustering for multivariate and mixed-mode data : application to multi-species spatial ecological data Georgescu, Vera 17 December 2010 (has links) En écologie des populations, les distributions spatiales d'espèces sont étudiées afin d'inférer l'existence de processus sous-jacents, tels que les interactions intra- et interspécifiques et les réponses des espèces à l'hétérogénéité de l'environnement. Nous proposons d'analyser les données spatiales multi-spécifiques sous l'angle des assemblages d'espèces, que nous considérons en termes d'abondances absolues et non de diversité des espèces. Les assemblages d'espèces sont une des signatures des interactions spatiales locales des espèces entre elles et avec leur environnement. L'étude des assemblages d'espèces peut permettre de détecter plusieurs types d'équilibres spatialisés et de les associer à l'effet de variables environnementales. Les assemblages d'espèces sont définis ici par classification non spatiale des observations multivariées d'abondances d'espèces. Les méthodes de classification basées sur les modèles de mélange ont été choisies afin d'avoir une mesure de l'incertitude de la classification et de modéliser un assemblage par une loi de probabilité multivariée. Dans ce cadre, nous proposons : 1. une méthode d'analyse exploratoire de données spatiales multivariées d'abondances d'espèces, qui permet de détecter des assemblages d'espèces par classification, de les cartographier et d'analyser leur structure spatiale. Des lois usuelles, telle que la Gaussienne multivariée, sont utilisées pour modéliser les assemblages, 2. un modèle hiérarchique pour les assemblages d'abondances lorsque les lois usuelles ne suffisent pas. Ce modèle peut facilement s'adapter à des données contenant des variables de types différents, qui sont fréquemment rencontrées en écologie, 3. une méthode de classification de données contenant des variables de types différents basée sur des mélanges de lois à structure hiérarchique (définies en 2.). Deux applications en écologie ont guidé et illustré ce travail : l'étude à petite échelle des assemblages de deux espèces de pucerons sur des feuilles de clémentinier et l'étude à large échelle des assemblages d'une plante hôte, le plantain lancéolé, et de son pathogène, l'oïdium, sur les îles Aland en Finlande / In population ecology, species spatial patterns are studied in order to infer the existence of underlying processes, such as interactions within and between species, and species response to environmental heterogeneity. We propose to analyze spatial multi-species data by defining species abundance assemblages. Species assemblages are one of the signatures of the local spatial interactions between species and with their environment. Species assemblages are defined here by a non spatial classification of the multivariate observations of species abundances. Model-based clustering procedures using mixture models were chosen in order to have an estimation of the classification uncertainty and to model an assemblage by a multivariate probability distribution. We propose : 1. An exploratory tool for the study of spatial multivariate observations of species abundances, which defines species assemblages by a model-based clustering procedure, and then maps and analyzes the spatial structure of the assemblages. Common distributions, such as the multivariate Gaussian, are used to model the assemblages. 2. A hierarchical model for abundance assemblages which cannot be modeled with common distributions. This model can be easily adapted to mixed mode data, which are frequent in ecology. 3. A clustering procedure for mixed-mode data based on mixtures of hierarchical models. Two ecological case-studies guided and illustrated this work: the small-scale study of the assemblages of two aphid species on leaves of Citrus trees, and the large-scale study of the assemblages of a host plant, Plantago lanceolata, and its pathogen, the powdery mildew, on the Aland islands in south-west Finland Assemblage d'espèces Coexistence Données mixtes Données multivariées spatiales Modèle gaussien latent Modèle hiérarchique Monte Carlo EM Species assemblages Finite mixture models Coexistence Mixed mode data Multivariate data Latent gaussian model Hierarchical model Model-based clustering Spatial data
95	空間相關存活資料之貝氏半參數比例勝算模式 / Bayesian semiparametric proportional odds models for spatially correlated survival data 張凱嵐, Chang, Kai lan Unknown Date (has links) 近來地理資訊系統(GIS)之資料庫受到不同領域的統計學家廣泛的研究，以期建立及分析可描述空間聚集效應及變異之模型，而描述空間相關存活資料之統計模式為公共衛生及流行病學上新興的研究議題。本文擬建立多維度半參數的貝氏階層模型，並結合空間及非空間隨機效應以描述存活資料中的空間變異。此模式將利用多變量條件自回歸(MCAR)模型以檢驗在不同地理區域中是否存有空間聚集效應。而基準風險函數之生成為分析貝氏半參數階層模型的重要步驟，本研究將利用混合Polya樹之方式生成基準風險函數。美國國家癌症研究院之「流行病監測及最終結果」(Surveillance Epidemiology and End Results, SEER)資料庫為目前美國最完整的癌症病人長期追蹤資料，包含癌症病人存活狀況、多重癌症史、居住地區及其他分析所需之個人資料。本文將自此資料庫擷取美國愛荷華州之癌症病人資料為例作實證分析，並以貝氏統計分析中常用之模型比較標準如條件預測指標(CPO)、平均對數擬邊際概似函數值(ALMPL)、離差訊息準則(DIC)分別測試其可靠度。 / The databases of Geographic Information System (GIS) have gained attention among different fields of statisticians to develop and analyze models which account for spatial clustering and variation. There is an emerging interest in modeling spatially correlated survival data in public health and epidemiologic studies. In this article, we develop Bayesian multivariate semiparametric hierarchical models to incorporate both spatially correlated and uncorrelated frailties to answer the question of spatial variation in the survival patterns, and we use multivariate conditionally autoregressive (MCAR) model to detect that whether there exists the spatial cluster across different areas. The baseline hazard function will be modeled semiparametrically using mixtures of finite Polya trees. The SEER (Surveillance Epidemiology and End Results) database from the National Cancer Institute (NCI) provides comprehensive cancer data about patient’s survival time, regional information, and others demographic information. We implement our Bayesian hierarchical spatial models on Iowa cancer data extracted from SEER database. We illustrate how to compute the conditional predictive ordinate (CPO), the average log-marginal pseudo-likelihood (ALMPL), and deviance information criterion (DIC), which are Bayesian criterions for model checking and comparison among competing models. 空間聚集比例勝算模型貝氏階層模型混合Polya樹馬可夫鏈蒙地卡羅模擬多變量條件自回歸模型條件預測指標平均對數擬邊際概似函數值離差訊息準則 spatial clusters proportional odds Bayesian hierarchical model mixture of Polya trees Markov Chain Monte Carlo (MCMC) conditional predictive ordinate (CPO) deviance information criterion (DIC)

Page generated in 0.0662 seconds