Spelling suggestions: "subject:"kerninflation"" "subject:"zeroinflated""
11 |
Méthodes statistiques pour la modélisation des facteurs influençant la distribution et l’abondance de populations : application aux rapaces diurnes nichant en France / Statistical methods for modelling the distribution and abundance of populations : application to raptors breeding in FranceLe Rest, Kévin 19 December 2013 (has links)
Face au déclin global de la biodiversité, de nombreux suivis de populations animales et végétales sont réalisés sur de grandes zones géographiques et durant une longue période afin de comprendre les facteurs déterminant la distribution, l’abondance et les tendances des populations. Ces suivis à larges échelles permettent de statuer quantitativement sur l’état des populations et de mettre en place des plans de gestion appropriés en accord avec les échelles biologiques. L’analyse statistique de ce type de données n’est cependant pas sans poser un certain nombre de problèmes. Classiquement, on utilise des modèles linéaires généralisés (GLM), formalisant les liens entre des variables supposées influentes (par exemple caractérisant l’environnement) et la variable d’intérêt (souvent la présence / absence de l’espèce ou des comptages). Il se pose alors un problème majeur qui concerne la manière de sélectionner ces variables influentes dans un contexte de données spatialisées. Cette thèse explore différentes solutions et propose une méthode facilement applicable, basée sur une validation croisée tenant compte des dépendances spatiales. La robustesse de la méthode est évaluée par des simulations et différents cas d’études dont des données de comptages présentant une variabilité plus forte qu’attendue (surdispersion). Un intérêt particulier est aussi porté aux méthodes de modélisation pour les données ayant un nombre de zéros plus important qu’attendu (inflation en zéro). La dernière partie de la thèse utilise ces enseignements méthodologiques pour modéliser la distribution, l’abondance et les tendances des rapaces diurnes en France. / In the context of global biodiversity loss, more and more surveys are done at a broad spatial extent and during a long time period, which is done in order to understand processes driving the distribution, the abundance and the trends of populations at the relevant biological scales. These studies allow then defining more precise conservation status for species and establish pertinent conservation measures. However, the statistical analysis of such datasets leads some concerns. Usually, generalized linear models (GLM) are used, trying to link the variable of interest (e.g. presence/absence or abundance) with some external variables suspected to influence it (e.g. climatic and habitat variables). The main unresolved concern is about the selection of these external variables from a spatial dataset. This thesis details several possibilities and proposes a widely usable method based on a cross-validation procedure accounting for spatial dependencies. The method is evaluated through simulations and applied on several case studies, including datasets with higher than expected variability (overdispersion). A focus is also done for methods accounting for an excess of zeros (zero-inflation). The last part of this manuscript applies these methodological developments for modelling the distribution, abundance and trend of raptors breeding in France.
|
12 |
Properties of Hurdle Negative Binomial Models for Zero-Inflated and Overdispersed Count dataBhaktha, Nivedita January 2018 (has links)
No description available.
|
13 |
Broadband's Role in Agricultural Job Postings in U.S. CountiesDouglas John Abney (13803703) 07 February 2023 (has links)
<p> </p>
<p>This study’s purpose is to examine the relationship between broadband and online agriculture job postings. While rural broadband has been a wide studied topic, little attention has been focused on broadband’s relationship to agricultural job demand. This research uses a spatial count model that estimates agriculture and digital agriculture jobs by U.S. counties. Data was collected using the Google Jobs API developed by SerpAPI. Job advertisements were collected monthly from June 2021 through June 2022 and again in November 2022. Digital agriculture jobs postings were extracted as a subset from the overall dataset. By searching for key terms in job advertisements, context analysis filtered and identified digital agriculture jobs. Digital agriculture job openings were identified in order to examine how broadband relates to data intensive jobs in agriculture. Jobs focused in digital agriculture require increased levels of technology and increased data throughput. We hypothesized that occurrences of digital agriculture job openings would likely be reliant on broadband. Broadband data in this study represents average download speeds, average upload speeds, the percentage of households with internet access, and the percentage of the population with internet speeds at and above 100 over 20 megabits per second. The approach for modeling this data requires a hurdle negative binomial regression model as our count data encountered many zero observations and suffered from overdispersion. Spatial effects were incorporated into the model to alleviate spatial autocorrelation and help define agricultural job openings among surrounding counties. Our findings support the funding of broadband policies in agriculture. While controlling for outside factors such as demographics and county production, we found that agriculture job openings were positively influenced by broadband. However, we determined that broadband metrics show no relationship with the presence of digital agriculture job openings likely due to rarity and potential seasonality in the data. This information aids as a steppingstone for increasing the knowledge of broadband’s impact on agriculture. This study may aid in supporting future studies that seek to define causal relationships between broadband and agriculture jobs. </p>
|
14 |
Abundância de aves de rapina no Cerrado e Pantanal do Mato Grosso do Sul e os efeitos da degradação de hábitat: perspectivas com métodos baseados na detectabilidade / Raptor abundance in the Brazilian Cerrado and Pantanal: insights from detection-based methodsDénes, Francisco Voeroes 12 September 2014 (has links)
A urbanização e a expansão das fronteiras agrícolas na região Neotropical estão entre as principais forças causadoras da degradação ambiental em hábitats abertos naturais. Inferências e estimativas de abundância são críticas para quantificação de dinâmicas populacionais e impactos de mudanças ambientais. Contudo, a detecção imperfeita e outros fenômenos que causam inflação de zeros podem induzir erros de estimativas e dificultar a identificação de padrões ecológicos. Examinamos como a consideração desses fenômenos em dados de contagens de indivíduos não marcados pode informar na escolha do método apropriado para estimativas populacionais. Revisamos métodos estabelecidos (modelos lineares generalizados [GLMs] e amostragem de distância [distance sampling]) e emergentes que usam modelos hierárquicos baseados em misturas (N-mixture; modelo de Royle-Nichols [RN], e N-mixture básico, zero inflacionado, espacialmente explicito, visita única, e multiespécies) para estimar a abundância de populações não marcadas. Como estudo de caso, aplicamos o método N-mixture baseado em visitas únicas para modelar dados de contagens de aves de rapina em estradas e investigar como transformações de habitat no Cerrado e Pantanal do Mato Grosso do Sul afetaram as populações de 12 espécies em uma escala regional (>300.000 km2). Os métodos diferem nos pré-requisitos de desenho amostral, e a sua adequabilidade depender da espécie em questão, da escala e objetivos do estudo, e considerações financeiras e logísticas, que devem ser avaliados para que verbas, tempo e esforço sejam utilizados com eficiência. No estudo de caso, a detecção de todas as espécies foi influenciada pela horário de amostragem, com efeitos congruentes com expectativas baseadas no comportamentos de forregeamento e de voo. A vegetação fechada e carcaças também influenciaram a detecção de algumas espécies. A abundância da maioria das espécies foi negativamente influenciada pela conversão de habitats naturais para antrópicos, particularmente pastagens e plantações de soja e cana-de-açúcar, até mesmo para espécies generalistas consideradas como indicadores ruins da qualidade de hábitats. A proteção dos hábitats naturais remanescentes é essencial para prevenir um declínio ainda maior das populações de aves de rapina na área de estudo, especialmente no domínio do Cerrado / Urbanization and the expansion of agricultural frontiers are among the main forces driving the degradation of natural habitats in Neotropical open habitats. Inference and estimates of abundance are critical for quantifying population dynamics and the impacts of environmental change. Yet imperfect detection and other phenomena that cause zero inflation can induce estimation error and obscure ecological patterns. We examine how detection error and zero-inflation in count data of unmarked individuals inform the choice of analytical method for estimating population size. We review established (GLMs and distance sampling) and emerging methods that use N-mixture models (Royle-Nichols model, and basic, zero-inflated, temporary emigration, beta-binomial, generalized open-population, spatially explicit, single-visit and multispecies) to estimate abundance of unmarked populations. As a case study, we employed a single visit N-mixture approach to model roadside raptor count data and investigate how land-use transformations in the Cerrado and Pantanal domains in Brazil have affected the populations of 12 species on a regional scale (>300,000 km2). Methods differ in sampling design requirements, and their suitability will depend on the study species, scale and objectives of the study, and financial and logistical considerations, which should be evaluated to use funds, time and effort efficiently. In the case study, detection of all species was influenced by time of day, with effects that follow expectations based on foraging and flying behavior. Closed vegetation on and carcasses found during surveys also influenced detection of some species. Abundance of most species was negatively influenced by conversion of natural Cerrado and Pantanal habitats to anthropogenic uses, particularly pastures, soybean and sugar cane plantations, even for generalist species usually considered poor habitat-quality indicators. Protection of the remaining natural habitats is essential to prevent further decline of raptor populations in the study area, especially in the Cerrado domain
|
15 |
Understanding patterns of aggregation in count dataSebatjane, Phuti 06 1900 (has links)
The term aggregation refers to overdispersion and both are used interchangeably in this thesis. In addressing the problem of prevalence of infectious parasite species faced by most rural livestock farmers, we model the distribution of faecal egg counts of 15 parasite species (13 internal parasites and 2 ticks) common in sheep and goats. Aggregation and excess zeroes is addressed through the use of generalised linear models. The abundance of each species was modelled using six different distributions: the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-altered Poisson (ZAP) and zero-altered negative binomial (ZANB) and their fit was later compared. Excess zero models (ZIP, ZINB, ZAP and ZANB) were found to be a better fit compared to standard count models (Poisson and negative binomial) in all 15 cases. We further investigated how distributional assumption a↵ects aggregation and zero inflation. Aggregation and zero inflation (measured by the dispersion parameter k and the zero inflation probability) were found to vary greatly with distributional assumption; this in turn changed the fixed-effects structure. Serial autocorrelation between adjacent observations was later taken into account by fitting observation driven time series models to the data. Simultaneously taking into account autocorrelation, overdispersion and zero inflation
proved to be successful as zero inflated autoregressive models performed better than zero inflated models in most cases. Apart from contribution to the knowledge of science, predictability of parasite burden will help farmers with effective disease management interventions. Researchers confronted with the task of analysing count data with excess zeroes can use the findings of this illustrative study as a guideline irrespective of their research discipline. Statistical methods from model selection, quantifying of zero inflation through to accounting for serial autocorrelation are described and illustrated. / Statistics / M.Sc. (Statistics)
|
16 |
Abundância de aves de rapina no Cerrado e Pantanal do Mato Grosso do Sul e os efeitos da degradação de hábitat: perspectivas com métodos baseados na detectabilidade / Raptor abundance in the Brazilian Cerrado and Pantanal: insights from detection-based methodsFrancisco Voeroes Dénes 12 September 2014 (has links)
A urbanização e a expansão das fronteiras agrícolas na região Neotropical estão entre as principais forças causadoras da degradação ambiental em hábitats abertos naturais. Inferências e estimativas de abundância são críticas para quantificação de dinâmicas populacionais e impactos de mudanças ambientais. Contudo, a detecção imperfeita e outros fenômenos que causam inflação de zeros podem induzir erros de estimativas e dificultar a identificação de padrões ecológicos. Examinamos como a consideração desses fenômenos em dados de contagens de indivíduos não marcados pode informar na escolha do método apropriado para estimativas populacionais. Revisamos métodos estabelecidos (modelos lineares generalizados [GLMs] e amostragem de distância [distance sampling]) e emergentes que usam modelos hierárquicos baseados em misturas (N-mixture; modelo de Royle-Nichols [RN], e N-mixture básico, zero inflacionado, espacialmente explicito, visita única, e multiespécies) para estimar a abundância de populações não marcadas. Como estudo de caso, aplicamos o método N-mixture baseado em visitas únicas para modelar dados de contagens de aves de rapina em estradas e investigar como transformações de habitat no Cerrado e Pantanal do Mato Grosso do Sul afetaram as populações de 12 espécies em uma escala regional (>300.000 km2). Os métodos diferem nos pré-requisitos de desenho amostral, e a sua adequabilidade depender da espécie em questão, da escala e objetivos do estudo, e considerações financeiras e logísticas, que devem ser avaliados para que verbas, tempo e esforço sejam utilizados com eficiência. No estudo de caso, a detecção de todas as espécies foi influenciada pela horário de amostragem, com efeitos congruentes com expectativas baseadas no comportamentos de forregeamento e de voo. A vegetação fechada e carcaças também influenciaram a detecção de algumas espécies. A abundância da maioria das espécies foi negativamente influenciada pela conversão de habitats naturais para antrópicos, particularmente pastagens e plantações de soja e cana-de-açúcar, até mesmo para espécies generalistas consideradas como indicadores ruins da qualidade de hábitats. A proteção dos hábitats naturais remanescentes é essencial para prevenir um declínio ainda maior das populações de aves de rapina na área de estudo, especialmente no domínio do Cerrado / Urbanization and the expansion of agricultural frontiers are among the main forces driving the degradation of natural habitats in Neotropical open habitats. Inference and estimates of abundance are critical for quantifying population dynamics and the impacts of environmental change. Yet imperfect detection and other phenomena that cause zero inflation can induce estimation error and obscure ecological patterns. We examine how detection error and zero-inflation in count data of unmarked individuals inform the choice of analytical method for estimating population size. We review established (GLMs and distance sampling) and emerging methods that use N-mixture models (Royle-Nichols model, and basic, zero-inflated, temporary emigration, beta-binomial, generalized open-population, spatially explicit, single-visit and multispecies) to estimate abundance of unmarked populations. As a case study, we employed a single visit N-mixture approach to model roadside raptor count data and investigate how land-use transformations in the Cerrado and Pantanal domains in Brazil have affected the populations of 12 species on a regional scale (>300,000 km2). Methods differ in sampling design requirements, and their suitability will depend on the study species, scale and objectives of the study, and financial and logistical considerations, which should be evaluated to use funds, time and effort efficiently. In the case study, detection of all species was influenced by time of day, with effects that follow expectations based on foraging and flying behavior. Closed vegetation on and carcasses found during surveys also influenced detection of some species. Abundance of most species was negatively influenced by conversion of natural Cerrado and Pantanal habitats to anthropogenic uses, particularly pastures, soybean and sugar cane plantations, even for generalist species usually considered poor habitat-quality indicators. Protection of the remaining natural habitats is essential to prevent further decline of raptor populations in the study area, especially in the Cerrado domain
|
17 |
Extensions of nonnegative matrix factorization for exploratory data analysis / 探索的なデータ分析のための非負値行列因子分解の拡張 / タンサクテキナ データ ブンセキ ノ タメ ノ ヒフチ ギョウレツ インシ ブンカイ ノ カクチョウ阿部 寛康, Hiroyasu Abe 22 March 2017 (has links)
非負値行列因子分解(NMF)は,全要素が非負であるデータ行列に対する行列分解法である.本論文では,実在するデータ行列に頻繁に見られる特徴や解釈容易性の向上を考慮に入れ,探索的にデータ分析を行うためのNMFの拡張について論じている.具体的には,零過剰行列や外れ値を含む行列を扱うための確率分布やダイバージェンス,さらには分解結果である因子行列の数や因子行列への直交制約について述べている. / Nonnegative matrix factorization (NMF) is a matrix decomposition technique to analyze nonnegative data matrices, which are matrices of which all elements are nonnegative. In this thesis, we discuss extensions of NMF for exploratory data analysis considering common features of a real nonnegative data matrix and an easy interpretation. In particular, we discuss probability distributions and divergences for zero-inflated data matrix and data matrix with outliers, two-factor vs. three-factor, and orthogonal constraint to factor matrices. / 博士(文化情報学) / Doctor of Culture and Information Science / 同志社大学 / Doshisha University
|
Page generated in 0.0982 seconds