Spelling suggestions: "subject:"count data"" "subject:"count mata""
71 |
Estimation of State Space Models and Stochastic VolatilityMiller Lira, Shirley 09 1900 (has links)
Ma thèse est composée de trois chapitres reliés à l'estimation des modèles espace-état et volatilité stochastique.
Dans le première article, nous développons une procédure de lissage de l'état, avec efficacité computationnelle, dans un modèle espace-état linéaire et gaussien. Nous montrons comment exploiter la structure particulière des modèles espace-état pour tirer les états latents efficacement. Nous analysons l'efficacité computationnelle des méthodes basées sur le filtre de Kalman, l'algorithme facteur de Cholesky et notre nouvelle méthode utilisant le compte d'opérations et d'expériences de calcul. Nous montrons que pour de nombreux cas importants, notre méthode est plus efficace. Les gains sont particulièrement grands pour les cas où la dimension des variables observées est grande ou dans les cas où il faut faire des tirages répétés des états pour les mêmes valeurs de paramètres. Comme application, on considère un modèle multivarié de Poisson avec le temps des intensités variables, lequel est utilisé pour analyser le compte de données des transactions sur les marchés financières.
Dans le deuxième chapitre, nous proposons une nouvelle technique pour analyser des modèles multivariés à volatilité stochastique. La méthode proposée est basée sur le tirage efficace de la volatilité de son densité conditionnelle sachant les paramètres et les données. Notre méthodologie s'applique aux modèles avec plusieurs types de dépendance dans la coupe transversale. Nous pouvons modeler des matrices de corrélation conditionnelles variant dans le temps en incorporant des facteurs dans l'équation de rendements, où les facteurs sont des processus de volatilité stochastique indépendants. Nous pouvons incorporer des copules pour permettre la dépendance conditionnelle des rendements sachant la volatilité, permettant avoir différent lois marginaux de Student avec des degrés de liberté spécifiques pour capturer l'hétérogénéité des rendements. On tire la volatilité comme un bloc dans la dimension du temps et un à la fois dans la dimension de la coupe transversale. Nous appliquons la méthode introduite par McCausland (2012) pour obtenir une bonne approximation de la distribution conditionnelle à posteriori de la volatilité d'un rendement sachant les volatilités d'autres rendements, les paramètres et les corrélations dynamiques. Le modèle est évalué en utilisant des données réelles pour dix taux de change. Nous rapportons des résultats pour des modèles univariés de volatilité stochastique et deux modèles multivariés.
Dans le troisième chapitre, nous évaluons l'information contribuée par des variations de volatilite réalisée à l'évaluation et prévision de la volatilité quand des prix sont mesurés avec et sans erreur. Nous utilisons de modèles de volatilité stochastique. Nous considérons le point de vue d'un investisseur pour qui la volatilité est une variable latent inconnu et la volatilité réalisée est une quantité d'échantillon qui contient des informations sur lui. Nous employons des méthodes bayésiennes de Monte Carlo par chaîne de Markov pour estimer les modèles, qui permettent la formulation, non seulement des densités a posteriori de la volatilité, mais aussi les densités prédictives de la volatilité future. Nous comparons les prévisions de volatilité et les taux de succès des prévisions qui emploient et n'emploient pas l'information contenue dans la volatilité réalisée. Cette approche se distingue de celles existantes dans la littérature empirique en ce sens que ces dernières se limitent le plus souvent à documenter la capacité de la volatilité réalisée à se prévoir à elle-même. Nous présentons des applications empiriques en utilisant les rendements journaliers des indices et de taux de change. Les différents modèles concurrents sont appliqués à la seconde moitié de 2008, une période marquante dans la récente crise financière. / My thesis consists of three chapters related to the estimation of state space models and stochastic volatility models.
In the first chapter we develop a computationally efficient procedure for state smoothing in Gaussian linear state space models. We show how to exploit the special structure of state-space models to draw latent states efficiently. We analyze the computational efficiency of Kalman-filter-based methods, the Cholesky Factor Algorithm, and our new method using counts of operations and computational experiments. We show that for many important cases, our method is most efficient. Gains are particularly large for cases where the dimension of observed variables is large or where one makes repeated draws of states for the same parameter values. We apply our method to a multivariate Poisson model with time-varying intensities, which we use to analyze financial market transaction count data.
In the second chapter, we propose a new technique for the analysis of multivariate stochastic volatility models, based on efficient draws of volatility from its conditional posterior distribution. It applies to models with several kinds of cross-sectional dependence. Full VAR coefficient and covariance matrices give cross-sectional volatility dependence. Mean factor structure allows conditional correlations, given states, to vary in time. The conditional return distribution features Student's t marginals, with asset-specific degrees of freedom, and copulas describing cross-sectional dependence. We draw volatility as a block in the time dimension and one-at-a-time in the cross-section. Following McCausland(2012), we use close approximations of the conditional posterior distributions of volatility blocks as Metropolis-Hastings proposal distributions. We illustrate using daily return data for ten currencies. We report results for univariate stochastic volatility models and two multivariate models.
In the third chapter, we evaluate the information contributed by (variations of) realized volatility to the estimation and forecasting of volatility when prices are measured with and without error using a stochastic volatility model. We consider the viewpoint of an investor for whom volatility is an unknown latent variable and realized volatility is a sample quantity which contains information about it. We use Bayesian Markov Chain Monte Carlo (MCMC) methods to estimate the models, which allow the formulation of the posterior densities of in-sample volatilities, and the predictive densities of future volatilities. We then compare the volatility forecasts and hit rates from predictions that use and do not use the information contained in realized volatility. This approach is in contrast with most of the empirical realized volatility literature which most often documents the ability of realized volatility to forecast itself. Our empirical applications use daily index returns and foreign exchange during the 2008-2009 financial crisis.
|
72 |
Understanding patterns of aggregation in count dataSebatjane, Phuti 06 1900 (has links)
The term aggregation refers to overdispersion and both are used interchangeably in this thesis. In addressing the problem of prevalence of infectious parasite species faced by most rural livestock farmers, we model the distribution of faecal egg counts of 15 parasite species (13 internal parasites and 2 ticks) common in sheep and goats. Aggregation and excess zeroes is addressed through the use of generalised linear models. The abundance of each species was modelled using six different distributions: the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-altered Poisson (ZAP) and zero-altered negative binomial (ZANB) and their fit was later compared. Excess zero models (ZIP, ZINB, ZAP and ZANB) were found to be a better fit compared to standard count models (Poisson and negative binomial) in all 15 cases. We further investigated how distributional assumption a↵ects aggregation and zero inflation. Aggregation and zero inflation (measured by the dispersion parameter k and the zero inflation probability) were found to vary greatly with distributional assumption; this in turn changed the fixed-effects structure. Serial autocorrelation between adjacent observations was later taken into account by fitting observation driven time series models to the data. Simultaneously taking into account autocorrelation, overdispersion and zero inflation
proved to be successful as zero inflated autoregressive models performed better than zero inflated models in most cases. Apart from contribution to the knowledge of science, predictability of parasite burden will help farmers with effective disease management interventions. Researchers confronted with the task of analysing count data with excess zeroes can use the findings of this illustrative study as a guideline irrespective of their research discipline. Statistical methods from model selection, quantifying of zero inflation through to accounting for serial autocorrelation are described and illustrated. / Statistics / M.Sc. (Statistics)
|
73 |
Estimação clássica e bayesiana para relação espécieárea com distribuições truncadas no zeroArrabal, Claude Thiago 23 March 2012 (has links)
Made available in DSpace on 2016-06-02T20:06:07Z (GMT). No. of bitstreams: 1
4453.pdf: 2980949 bytes, checksum: a5e49490266d2a0b649d487d8bf298d5 (MD5)
Previous issue date: 2012-03-23 / Financiadora de Estudos e Projetos / In ecology, understanding the species-area relationship (SARs) are extremely important to determine species diversity. SARs are fundamental to assess the impact due to the destruction of natural habitats, creation of biodiversity maps, to determine the minimum area to preserve. In this study, the number of species is observed in different area sizes. These studies are referred in the literature through nonlinear models without assuming any distribution for the data. In this situation, it only makes sense to consider areas in which the counts of species are greater than zero. As the dependent variable is a count data, we assume that this variable comes from a known distribution for discrete data positive. In this paper, we used the zero truncated Poisson distribution (ZTP) and zero truncated Negative Binomial (ZTNB) to represent the probability distribution of the random variable species diversity number. To describe the relationship between species diversity and habitat, we consider nonlinear models with asymptotic behavior: Exponencial Negativo, Weibull, Logístico, Chapman-Richards, Gompertz e Beta. In this paper, we take a Bayesian approach to fit models. With the purpose of obtain the conditional distributions, we propose the use of latent variables to implement the Gibbs sampler. Introducing a comparative study through simulated data and will consider an application to a real data set. / Em ecologia, a compreensão da relação espécie-área (SARs) é de extrema importância para a determinação da diversidade de espécies e avaliar o impacto devido à destruição de habitats naturais. Neste estudo, observa-se o número de espécies em diferentes tamanhos de área. Estes estudos são abordados na literatura através de modelos não lineares sem assumir alguma distribuição para os dados. Nesta situação, só faz sentido considerar áreas nas quais as contagens das espécies são maiores do que zero. Como a variável dependente é um dado de contagem, assumiremos que esta variável provém de alguma distribuição conhecida para dados discretos positivos. Neste trabalho, utilizamos as distribuições de Poisson zero-truncada (PZT) e Binomial Negativa zero-truncada (BNZT) para representar a distribuição do número de espécies. Para descrever a relação espécie-área, consideramos os modelos não lineares com comportamento assintótico: Exponencial Negativo, Weibull, Logístico, Chapman-Richards, Gompertz e Beta. Neste trabalho os modelos foram ajustados através do método de verossimilhança, sendo proposto uma abordagem Bayesiana com a utilização de variáveis latentes auxiliares para a implementação do Amostrador de Gibbs.
|
74 |
Non-Parametric Clustering of Multivariate Count DataTekumalla, Lavanya Sita January 2017 (has links) (PDF)
The focus of this thesis is models for non-parametric clustering of multivariate count data. While there has been significant work in Bayesian non-parametric modelling in the last decade, in the context of mixture models for real-valued data and some forms of discrete data such as multinomial-mixtures, there has been much less work on non-parametric clustering of Multi-variate Count Data. The main challenges in clustering multivariate counts include choosing a suitable multivariate distribution that adequately captures the properties of the data, for instance handling over-dispersed data or sparse multivariate data, at the same time leveraging the inherent dependency structure between dimensions and across instances to get meaningful clusters.
As the first contribution, this thesis explores extensions to the Multivariate Poisson distribution, proposing efficient algorithms for non-parametric clustering of multivariate count data. While Poisson is the most popular distribution for count modelling, the Multivariate Poisson often leads to intractable inference and a suboptimal t of the data. To address this, we introduce a family of models based on the Sparse-Multivariate Poisson, that exploit the inherent sparsity in multivariate data, reducing the number of latent variables in the formulation of Multivariate Poisson leading to a better t and more efficient inference. We explore Dirichlet process mixture model extensions and temporal non-parametric extensions to models based on the Sparse Multivariate Poisson for practical use of Poisson based models for non-parametric clustering of multivariate counts in real-world applications. As a second contribution, this thesis addresses moving beyond the limitations of Poisson based models for non-parametric clustering, for instance in handling over dispersed data or data with negative correlations. We explore, for the first time, marginal independent inference techniques based on the Gaussian Copula for multivariate count data in the Dirichlet Process mixture model setting. This enables non-parametric clustering of multivariate counts without limiting assumptions that usually restrict the marginal to belong to a particular family, such as the Poisson or the negative-binomial. This inference technique can also work for mixed data (combination of counts, binary and continuous data) enabling Bayesian non-parametric modelling to be used for a wide variety of data types. As the third contribution, this thesis addresses modelling a wide range of more complex dependencies such as asymmetric and tail dependencies during non-parametric clustering of multivariate count data with Vine Copula based Dirichlet process mixtures. While vine copula inference has been well explored for continuous data, it is still a topic of active research for multivariate counts and mixed multivariate data. Inference for multivariate counts and mixed data is a hard problem owing to ties that arise with discrete marginal. An efficient marginal independent inference approach based on extended rank likelihood, based on recent work in the statistics literature, is proposed in this thesis, extending the use vines for multivariate counts and mixed data in practical clustering scenarios.
This thesis also explores the novel systems application of Bulk Cache Preloading by analysing I/O traces though predictive models for temporal non-parametric clustering of multivariate count data. State of the art techniques in the caching domain are limited to exploiting short-range correlations in memory accesses at the milli-second granularity or smaller and cannot leverage long range correlations in traces. We explore for the first time, Bulk Cache Preloading, the process of pro-actively predicting data to load into cache, minutes or hours before the actual request from the application, by leveraging longer range correlation at the granularity of minutes or hours. This enables the development of machine learning techniques tailored for caching due to relaxed timing constraints. Our approach involves a data aggregation process, converting I/O traces into a temporal sequence of multivariate counts, that we analyse with the temporal non-parametric clustering models proposed in this thesis. While the focus of our thesis is models for non-parametric clustering for discrete data, particularly multivariate counts, we also hope our work on bulk cache preloading paves the way to more inter-disciplinary research for using data mining techniques in the systems domain.
As an additional contribution, this thesis addresses multi-level non-parametric admixture modelling for discrete data in the form of grouped categorical data, such as document collections. Non-parametric clustering for topic modelling in document collections, where a document is as-associated with an unknown number of semantic themes or topics, is well explored with admixture models such as the Hierarchical Dirichlet Process. However, there exist scenarios, where a doc-ument requires being associated with themes at multiple levels, where each theme is itself an admixture over themes at the previous level, motivating the need for multilevel admixtures. Consider the example of non-parametric entity-topic modelling of simultaneously learning entities and topics from document collections. This can be realized by modelling a document as an admixture over entities while entities could themselves be modeled as admixtures over topics. We propose the nested Hierarchical Dirichlet Process to address this gap and apply a two level version of our model to automatically learn author entities and topics from research corpora.
|
75 |
Abundância de aves de rapina no Cerrado e Pantanal do Mato Grosso do Sul e os efeitos da degradação de hábitat: perspectivas com métodos baseados na detectabilidade / Raptor abundance in the Brazilian Cerrado and Pantanal: insights from detection-based methodsFrancisco Voeroes Dénes 12 September 2014 (has links)
A urbanização e a expansão das fronteiras agrícolas na região Neotropical estão entre as principais forças causadoras da degradação ambiental em hábitats abertos naturais. Inferências e estimativas de abundância são críticas para quantificação de dinâmicas populacionais e impactos de mudanças ambientais. Contudo, a detecção imperfeita e outros fenômenos que causam inflação de zeros podem induzir erros de estimativas e dificultar a identificação de padrões ecológicos. Examinamos como a consideração desses fenômenos em dados de contagens de indivíduos não marcados pode informar na escolha do método apropriado para estimativas populacionais. Revisamos métodos estabelecidos (modelos lineares generalizados [GLMs] e amostragem de distância [distance sampling]) e emergentes que usam modelos hierárquicos baseados em misturas (N-mixture; modelo de Royle-Nichols [RN], e N-mixture básico, zero inflacionado, espacialmente explicito, visita única, e multiespécies) para estimar a abundância de populações não marcadas. Como estudo de caso, aplicamos o método N-mixture baseado em visitas únicas para modelar dados de contagens de aves de rapina em estradas e investigar como transformações de habitat no Cerrado e Pantanal do Mato Grosso do Sul afetaram as populações de 12 espécies em uma escala regional (>300.000 km2). Os métodos diferem nos pré-requisitos de desenho amostral, e a sua adequabilidade depender da espécie em questão, da escala e objetivos do estudo, e considerações financeiras e logísticas, que devem ser avaliados para que verbas, tempo e esforço sejam utilizados com eficiência. No estudo de caso, a detecção de todas as espécies foi influenciada pela horário de amostragem, com efeitos congruentes com expectativas baseadas no comportamentos de forregeamento e de voo. A vegetação fechada e carcaças também influenciaram a detecção de algumas espécies. A abundância da maioria das espécies foi negativamente influenciada pela conversão de habitats naturais para antrópicos, particularmente pastagens e plantações de soja e cana-de-açúcar, até mesmo para espécies generalistas consideradas como indicadores ruins da qualidade de hábitats. A proteção dos hábitats naturais remanescentes é essencial para prevenir um declínio ainda maior das populações de aves de rapina na área de estudo, especialmente no domínio do Cerrado / Urbanization and the expansion of agricultural frontiers are among the main forces driving the degradation of natural habitats in Neotropical open habitats. Inference and estimates of abundance are critical for quantifying population dynamics and the impacts of environmental change. Yet imperfect detection and other phenomena that cause zero inflation can induce estimation error and obscure ecological patterns. We examine how detection error and zero-inflation in count data of unmarked individuals inform the choice of analytical method for estimating population size. We review established (GLMs and distance sampling) and emerging methods that use N-mixture models (Royle-Nichols model, and basic, zero-inflated, temporary emigration, beta-binomial, generalized open-population, spatially explicit, single-visit and multispecies) to estimate abundance of unmarked populations. As a case study, we employed a single visit N-mixture approach to model roadside raptor count data and investigate how land-use transformations in the Cerrado and Pantanal domains in Brazil have affected the populations of 12 species on a regional scale (>300,000 km2). Methods differ in sampling design requirements, and their suitability will depend on the study species, scale and objectives of the study, and financial and logistical considerations, which should be evaluated to use funds, time and effort efficiently. In the case study, detection of all species was influenced by time of day, with effects that follow expectations based on foraging and flying behavior. Closed vegetation on and carcasses found during surveys also influenced detection of some species. Abundance of most species was negatively influenced by conversion of natural Cerrado and Pantanal habitats to anthropogenic uses, particularly pastures, soybean and sugar cane plantations, even for generalist species usually considered poor habitat-quality indicators. Protection of the remaining natural habitats is essential to prevent further decline of raptor populations in the study area, especially in the Cerrado domain
|
76 |
The space-time distribution of Palearctic Culicoides spp. vectors of Bluetongue disease in Europe / Distribution spatio-temporelle du genre Culicoides, vecteur de la fièvre catarrhale ovineRigot, Thibaud 24 October 2011 (has links)
Abstract :Bluetongue (BT) is a vector-borne infectious disease primarily transmitted to even- toed ungulates by the bite of several Culicoides species. The global distribution of BT can be attributed to the ubiquity of its vectors and its rapid spread, likely to the enhancement of human activities (intensification of animal production, trans- port, changing habitat). During the last decades, BT established in Southern Europe and more recently emerged in Northern Europe, causing the death of millions of domestic ruminants. On the same time, a Belgian research project has been set up to develop remote-sensing tools to study the EPidemiology and Space-TIme dynamicS of infectious diseases (EPISTIS). In that general framework, this thesis aimed to study the space-time distribution of the main Culicoides vectors occurring in Italy and Belgium, at two different scales. Firstly, we aimed to clarify the role of several eco-climatic factors on the regional-scale distribution of C. imicola in time, based on weekly samplings achieved throughout Italy from 2001 to 2006 and to develop an easy-to-use and reproducible tool, which could be widely validated on the basis of former vector sampling and freely accessible remote-sensing data. Secondly, we aimed to investigate how Culicoides species were distributed in the fine-scale habitat encountered throughout the agro-ecological landscapes of Belgium, while recent studies have suggested that the landscapes configuration could explain the spatial distribution of BT. In the first part, we showed that an autoregressive model where the observed monthly growth rate is predicted by monthly temperature, allowed predicting >70% of the seasonal variability in C. imicola trap catches. The model predicted the seasonality, the altitudinal gradient, and the low populations’ activity taking place during the winter. Incorporating eco-climatic indices such as the Normalized Difference Vegetation Index into the model did not enhance its predictive power. In the second part, we quantified how Culicoides populations are spatially structured in the neighbourhood of farms, and demonstrated the unexpectedly high level of population found in forest. We also showed how four classes of land use could influence the relative abundances of Culicoides species in the agro-ecological landscapes of Belgium. Although in summer, BT vectors were abundant in each of the four classes investigated, their relative abundances varied strongly as a function of sex, species and environmental conditions, and we quantified these variations. Finally, we also presented a new method to quantify the interference between Onderstepoort light traps, and used it to measure their range of attraction for several of the most common BT vectors species in Northern Europe. The model developed on C. imicola in Italy provided enthusiastic perspectives regarding the regional-scale analyses of its distribution in time, although further improvements are nevertheless required in order to assess the broad scale ecology of BT vectors throughout Europe. Mapping the abundances of C. imicola in Sardinia high- lighted an important lack of reliability attributable to the many land use classes that are currently not sampled in the vector surveillance achieved across Europe. Together with the novelties presented in the second part and the recent findings establishing that BT could circulate among wild hosts in both epidemiological systems (i.e. in Southern and Northern Europe), we call for increasing epidemiological and entomo- logical studies at the interface between farms and the surrounding natural habitats. Last, depicting in time the landscape-scale findings for Northern Europe highlighted how dramatic could be the role played by intensive farming practices to maintain BT within the agro-ecological landscapes studied and to facilitate its circulation between them. Quantifying the amplitude of the risk of disease transmission linked to these practices would require a further complex modeling approach accounting simultaneously for the diel activity of hosts, mainly resulting from the farming activities, the diel activities of different vector species and the landscapes configuration found in contrasted agro-ecological systems.<p>Résumé :La fièvre catarrhale ovine (FCO), encore appelée maladie de la langue bleue, est une maladie infectieuse des ruminants transmise par la piqûre d’un vecteur de type moucheron appartenant au genre Culicoides (Diptera :Ceratopogonidae). L’ubiquité de ses vecteurs peut expliquer son succès d’installation à l’échelle globale. Par ailleurs, sa rapide expansion a été grandement facilitée par l’importante activité anthropique (élevage, transport, modification de l’habitat) et peut-être même par les changements climatiques globaux. La FCO a été récemment qualifiée de maladie infectieuse émergente en Europe du fait de (i) son récent établissement dans la région, bien au delà de son aire de répartition traditionnelle, (ii) de sa forte capacité de dispersion affectant chaque jour un nombre plus important d’hôtes et enfin (iii) de sa forte virulence. Après avoir détaillé les caractéristiques majeures des deux principaux foyers de FCO rencontrés en Europe depuis 1998, la présente thèse s’est plus particulièrement intéressée à l’étude de la distribution spatio-temporelle de ses principaux vecteurs dans le sud (partie 1) puis dans le nord (partie 2) de l’Europe, à différentes échelles. Dans la première partie, un modèle discret, spatialement et temporellement explicite, a été développé afin de mesurer l’influence de différents facteurs éco-climatiques sur la distribution de Culicoides imicola, principal vecteur de la FCO dans le Bassin Méditerranéen. Les profils mensuels de distribution rencontrés en Sardaigne durant 6 années consécutives ont ainsi pu être reconstitués, principalement sur base de la température. Une cartographie de l’abondance de C. imicola sur le territoire a permis de mettre à jour le manque d’information sur sa distribution en dehors des exploitations agricoles. Dans la deuxième partie du travail, nous nous sommes penchés sur la distribution spatiale des Culicoides tels qu’on peut les rencontrer au sein de différents paysages agro-écologiques de Belgique. Nous avons ainsi pu décrire la structure adoptée par les populations de Culicoides au voisinage des fermes ainsi que quantifier l’importante population présente dans les forêts avoisinantes. Nous avons par ailleurs montré l’influence de différentes catégories d’utilisation du sol sur l’abondance et la composition en espèces. Enfin, nous avons présenté une méthode permettant de quantifier l’interférence entre des pièges lumineux utilisés dans un même paysage pour échantillonner les populations, et l’avons utilisé afin de mesurer leur rayon d’attractivité sur les espèces vectrices les plus communément rencontrées dans le nord de l’Europe. En guise de conclusion générale et conjointement aux récentes découvertes de cas de FCO au sein de la faune sauvage européenne, nous appelons à réaliser un plus grand nombre d’études éco-épidémiologiques à l’interface entre exploitations agricoles et zones (semi-) naturelles avoisinantes. En outres, les résultats présentés dans la seconde partie ont été mis en relation avec le mode de fonctionnement journalier de nos exploitations agricoles. Nous avons ainsi pu déduire le rôle dramatique joué par les pratiques agricoles intensives dans le maintien du virus de la FCO au sein de nos paysages agro-écologiques, ainsi que dans sa circulation d’un paysage à l’autre. Un cadre de modélisation complexe permettant une analyse simultanée de l’activité nycthémérale des hôtes de la FCO et de ses vecteurs Culicoides en fonction de la configuration des paysages agro-écologiques est néanmoins requis afin de quantifier l’amplitude du risque de transmission de la FCO lié aux pratiques agricoles intensives. / Doctorat en Sciences agronomiques et ingénierie biologique / info:eu-repo/semantics/nonPublished
|
77 |
Bid Forecasting in Public Procurement / Budgivningsmodeller i offentliga upphandlingarStiti, Karim, Yape, Shih Jung January 2019 (has links)
Public procurement amounts to a significant part of Sweden's GDP. Nevertheless, it is an overlooked sector characterized by low digitization and inefficient competition where bids are not submitted based on proper mathematical tools. This Thesis seeks to create a structured approach to bidding in cleaning services by determining factors affecting the participation and pricing decision of potential buyers. Furthermore, we assess price prediction by comparing multiple linear regression models (MLR) to support vector regression (SVR). In line with previous research in the construction sector, we find significance for several factors such as project duration, location and type of contract on the participation decision in the cleaning sector. One notable deviant is that we do not find contract size to have an impact on the pricing decision. Surprisingly, the performance of MLR are comparable to more advanced SVR models. Stochastic dominance tests on price performance concludes that experienced bidders perform better than their inexperienced counterparts and companies place more competitive bids in lowest price tenders compared to economically most advantageous tenders (EMAT) indicating that EMAT tenders are regarded as unstructured. However, no significance is found for larger actors performing better in bidding than smaller companies. / Offentliga upphandlingar utgör en signifikant del av Sveriges BNP. Trots detta är det en förbisedd sektor som karakteriseras av låg digitalisering och ineffektiv konkurrens där bud läggs baserat på intuition snarare än matematiska modeller. Denna avhandling ämnar skapa ett strukturerat tillvägagångssätt för budgivning inom städsektorn genom att bestämma faktorer som påverkar deltagande och prissättning. Vidare undersöker vi prisprediktionsmodeller genom att jämföra multipel linjära regressionsmodeller med en maskininlärningsmetod benämnd support vector regression. I enlighet med tidigare forskning i byggindustrin finner vi att flera faktorer som typ av kontrakt, projekttid och kontraktsplats har en statistisk signifikant påverkan på deltagande i kontrakt i städindustrin. En anmärkningsvärd skillnad är att kontraktsvärdet inte påverkar prissättning som tidigare forskning visat i andra områden. För prisprediktionen är det överraskande att den enklare linjära regressionsmodellen presterar jämlikt till den mer avancerade maskininlärningsmodellen. Stokastisk dominanstest visar att erfarna företag har en bättre precision i sin budgivning än mindre erfarna företag. Därtill lägger företag överlag mer konkurrenskraftiga bud i kontrakt där kvalitetsaspekter tas i beaktning utöver priset. Vilket kan indikera att budgivare upplever dessa kontrakt som mindre strukturerade. Däremot finner vi ingen signifikant skillnad mellan större och mindre företag i denna bemärkning.
|
78 |
Measuring poverty in the EU : investigating and improving the empirical validity in deprivation scales of povertyBedük, Selçuk January 2017 (has links)
Non-monetary deprivation indicators are now widely used for studying and measuring poverty in Europe. However, despite their prevalence, the empirical performance of existing deprivation scales has rarely been examined. This thesis i) identifies possible conceptual problems of existing deprivation scales such as indexing, missing dimensions and threshold; ii) empirically assesses the extent of possible error in measurement related to these conceptual problems; and iii) offer an alternative way for constructing deprivation measures to mitigate the identified conceptual problems. The thesis consists of four stand-alone papers, accompanied by an overarching introduction and conclusion. The first three papers provide empirical evidence on the empirical consequences of the missing dimensions and threshold problems for the measurement and analysis of poverty, while the fourth paper exemplifies a concept-led multidimensional design that can reduce the error introduced by these conceptual problems. The analysis is generally held for 25 EU countries using European Survey of Income and Living Conditions (EU-SILC); only in the second paper, the analysis is done for the UK using British Household Panel Survey (BHPS).
|
Page generated in 0.0757 seconds