Spelling suggestions: "subject:"kernel density"" "subject:"kernel clensity""
21 |
FITTING A DISTRIBUTION TO CATASTROPHIC EVENTOsei, Ebenezer 15 December 2010 (has links)
Statistics is a branch of mathematics which is heavily employed in the area of Actuarial Mathematics. This thesis first reviews the importance of statistical distributions in the analysis of insurance problems and the applications of Statistics in the area of risk and insurance. The Normal, Log-normal, Pareto, Gamma, standard Beta, Frechet, Gumbel, Weibull, Poisson, binomial, and negative binomial distributions are looked at and the importance of these distributions in general insurance is also emphasized. A careful review of literature is to provide practitioners in the general insurance industry with statistical tools which are of immediate application in the industry. These tools include estimation methods and fit statistics popular in the insurance industry. Finally this thesis carries out the task of fitting statistical distributions to the flood loss data in the 50 States of the United States.
|
22 |
The k-Sample Problem When k is Large and n SmallZhan, Dongling 2012 May 1900 (has links)
The k-sample problem, i.e., testing whether two or more data sets come from the same population, is a classic one in statistics. Instead of having a small number of k groups of
samples, this dissertation works on a large number of p groups of samples, where within each group, the sample size, n, is a fixed, small number. We call this as a "Large p, but Small n" setting. The primary goal of the research is to provide a test statistic based on kernel density estimation (KDE) that has an asymptotic normal distribution when p goes to infinity with n fixed.
In this dissertation, we propose a test statistic called Tp(S) and its standardized version, T(S). By using T(S), we conduct our test based on the critical values of the standard normal distribution. Theoretically, we show that our test is invariant to a location and scale transformation of the data. We also find conditions under which our test is consistent. Simulation studies show that our test has good power against a variety of alternatives. The real data analyses show that our test finds differences between gene distributions that are not due simply to location.
|
23 |
IDENTIFICATION OF HIGH COLLISION LOCATIONS FOR THE CITY OF REGINA USING GIS AND POST-NETWORK SCREENING ANALYSIS2013 August 1900 (has links)
In 2010, the American Association of State Highway and Transportation Officials (AASHTO) released the first edition of the Highway Safety Manual (HSM). The HSM introduces a six-step safety management process which provides engineers with a systematic and scientific approach to managing road safety. The first step of this process, network screening, aims to identify the locations that will most benefit from a safety improvement program. The output obtained from network screening is simply a list of locations that have a high concentration of collisions, based on their potential for safety improvement. The ranking naturally tends to lead to the assumption that the most highly ranked locations are the obvious target locations where road authorities should allocate their often-limited road safety resources. Though these locations contain the highest frequency of collisions, they are often spatially unrelated, and scattered throughout the roadway network. Allocating safety resources to these locations may not be the most effective method of increasing road safety.
The purpose of this research is to investigate and validate a two-step method of post-network screening analysis, which identifies collision hotzones (i.e., groups of neighboring hotspots) on a road network. The first step is the network screening process described in the HSM. The second step is new and involves network-constrained kernel density estimation (KDE), a type of spatial analysis. KDE uses expected collision counts to estimate collision density, and outputs a graphical display that shows areas (referred to here as hotzones) with high collision densities. A particularly interesting area of application is the identification of high-collision corridors that may benefit from a program of systemic safety improvements. The proposed method was tested using five years of collision data (2005-2009) for the City of Regina, Saskatchewan. Three different network screening measures were compared: 1) observed collision counts, 2) observed severity-weighted collision counts, and 3) expected severity-weighted collision counts. The study found that observed severity-weighted collision counts produced a dramatic picture of the City's hotzones, but this picture could be misleading as it could be heavily influenced by a small number of severe collisions. The results obtained from the expected severity-weighted collision counts smoothed the effects of the severity-weighting and successfully reduced regression-to-the-mean bias. A comparison was made between the proposed approach and the results of the HSM’s existing network screening method. As the proposed approach takes the spatial association of roadway segments into account, and is not limited to single roadway segments, the identified hotzones capture a higher number of expected EPDO collisions than the existing HSM methodology. The study concludes that the proposed two-step method can help transportation safety professionals to prioritize hotzones within high-collision corridors more efficiently and scientifically.
Jurisdiction-specific safety performance functions (SPFs) were also developed over the course of this research, for both intersections (three-leg unsignalized, four-leg unsignalized, three and four-leg signalized), and roadway segments (major arterials, minor arterials, and collectors). These SPFs were compared to the base SPFs provided in the HSM, as well as calibrated HSM SPFs. To compare the different SPFs and find the best-fitting SPFs for the study region, the study used statistical goodness-of-fit (GOF) tests and cumulative residual (CURE) plots. Based on the results of this research, the jurisdiction-specific SPFs were found to provide the best fit to the data, and would be the best SPFs for predicting collisions at intersections and roadway segments in the City of Regina.
|
24 |
Choosing a Kernel for Cross-ValidationSavchuk, Olga 14 January 2010 (has links)
The statistical properties of cross-validation bandwidths can be improved by choosing
an appropriate kernel, which is different from the kernels traditionally used for cross-
validation purposes. In the light of this idea, we developed two new methods of
bandwidth selection termed: Indirect cross-validation and Robust one-sided cross-
validation. The kernels used in the Indirect cross-validation method yield an
improvement in the relative bandwidth rate to n^1=4, which is substantially better
than the n^1=10 rate of the least squares cross-validation method. The robust kernels
used in the Robust one-sided cross-validation method eliminate the bandwidth bias
for the case of regression functions with discontinuous derivatives.
|
25 |
Geologic Factors Affecting Hydrocarbon Occurrence in Paleovalleys of the Mississippian-Pennsylvanian Unconformity in the Illinois BasinLondon, Jeremy Taylor 01 May 2014 (has links)
Paleovalleys associated with the Mississippian-Pennsylvanian unconformity have been identified as potential targets for hydrocarbon exploration in the Illinois Basin. Though there is little literature addressing the geologic factors controlling hydrocarbon accumulation in sub-Pennsylvanian paleovalleys basin-wide, much work has been done to identify the Mississippian-Pennsylvanian unconformity, characterize the Chesterian and basal Pennsylvanian lithology, map the sub-Pennsylvanian paleogeology and delineate the pre-Pennsylvanian paleovalleys in the Illinois Basin. This study uses Geographic Information Systems (GIS) to determine the geologic factors controlling the distribution of hydrocarbon-bearing sub-Pennsylvanian paleovalley fill in the Illinois Basin. A methodology was developed to identify densely-drilled areas without associated petroleum occurrence in basal Pennsylvanian paleovalley fill. Kernel density estimation was used to approximate drilling activity throughout the basin and identify “hotspots” of high well density. Pennsylvanian oil and gas fields were compared to the hotspots to identify which areas were most likely unrelated to Pennsylvanian production. Those hotspots were then compared to areas with known hydrocarbon accumulations in sub-Pennsylvanian paleovalleys to determine what varies geologically amongst these locations. Geologic differences provided insight regarding the spatial distribution of hydrocarbon-bearing sub-Pennsylvanian paleovalleys in the Illinois Basin. It was found that the distribution of hydrocarbon-bearing paleovalleys in the Illinois Basin follows structural features and faults. In the structurally dominated portions of the Illinois Basin, especially in eastern Illinois along the La Salle Anticlinal Belt, hydrocarbons migrate into paleovalleys from underlying hydrocarbon-rich sub- Pennsylvanian paleogeology. Along the fault-dominated areas, such as the Wabash, Rough Creek and Pennyrile Fault Zones, migration occurs upwards along faults from deeper sources. Cross sections were made to gain a better understanding of the paleovalley reservoir and to assess the utility of using all the data collected in this study to locate paleovalley reservoirs. The Main Consolidated Field in Crawford County, Illinois, was chosen as the best site for subsurface mapping due to its high well density, associated Pennsylvanian production, and locally incised productive Chesterian strata. Four cross sections revealed a complex paleovalley reservoir with many potential pay zones. The methodology used to locate this paleovalley reservoir can be applied to other potential sites within the Illinois Basin and to other basins as well.
|
26 |
On the Modifiable Areal Unit Problem and kernel home range analyses: the case of woodland caribou (Rangifer tarandus caribou)Kilistoff, Kristen 10 September 2014 (has links)
There are a myriad of studies of animal habitat use that employ the notion of “home range”. Aggregated information on animal locations provide insight into a geographically discrete units that represents the use of space by an animal. Among various methods to delineate home range is the commonly used Kernel Density Estimation (KDE). The KDE method delineates home ranges based on an animal’s Utilization Distribution (UD). Specifically, a UD estimates a three-dimensional surface representing the probability or intensity of habitat use by an animal based on known locations. The choice of bandwidth (i.e., kernel radius) in KDE determines the level of smoothing and thus, ultimately circumscribes the size and shape of an animal’s home range. The bounds of interest in a home range can then be delineated using different volume contours of the UD (e.g., 95% or 50%). Habitat variables can then be assessed within the chosen UD contour(s) to ascertain selection for certain habitat characteristics.
Home range analyses that utilize the KDE method, and indeed all methods of home range delineation, are subject to the Modifiable Areal Unit Problem (MAUP) whereby the changes in the scale at which data (e.g., habitat variables) are analysed can alter the outcome of statistical analyses and resulting ecological inferences. There are two components to MAUP, the scale and zoning effects. The scale effect refers to changes to the data and, consequently the outcome of analyses as a result of aggregating data to coarser spatial units of analysis. The aggregation of data can result in a loss of fine-scale detail as well as change the observed spatial patterns. The zone effect refers to how, when holding scale constant, the delineation of areal units in space can alter data values and ultimately the results of analyses. For example, habitat features captured within 1km2 gridded sampling units may change if instead 1km2 hexagon units are used.
This thesis holds there are three “modifiable” factors in home range analyses that render it subject to the MAUP. The first two relate specifically to the use of the KDE method namely, the choice of bandwidth and UD contour. The third is the grain (e.g., resolution) by which habitat variables are aggregated, which applies to KDE but also more broadly to other quantitative methods of home range delineation
In the following chapters we examine the changes in values of elevation and slope that result from changes to KDE bandwidth (Chapter 2) UD contour (Chapter 3) and DEM resolution (Chapter 4). In each chapter we also examine how the observed effects of altering each individual parameter of scale (e.g., bandwidth) changes when different scales of the other two parameters are considered (e.g., contour and resolution). We expected that the scale of each parameter examined would change the observed effect of other parameters. For example, that the homogenization of data at coarser resolutions would reduce the degree of difference in variable values between UD contours of each home range.
To explore the potential effects of MAUP on home range analyses we used as model population 13 northern woodland caribou (Rangifer tarandus). We created seasonal home ranges (winter, calving, summer, rut and fall) for each caribou using three different KDE bandwidths. Within each home range we delineated four contours based on differing levels of an animal’s UD. We then calculated values of elevation and slope (mean, standard deviation and coefficient of variation) using a Digital Elevation Model (DEM) aggregated to four different resolutions within the contours of each seasonal home range.
We found that each parameter of scale significantly changed the values of elevation and slope within the home ranges of the model caribou population. The magnitude as well as direction of change in slope and elevation often varied depending the specific contour or season. There was a greater decrease in the variability of elevation within the fall and winter seasons at smaller KDE bandwidths. The topographic variables were significantly different between all contours of caribou home ranges and the difference between contours were in general, significantly higher in fall and winter (elevation) or calving and summer (slope). The mean and SD of slope decreased at coarser resolutions in all caribou home ranges, whereas there was no change in elevation. We also found interactive effects of all three parameters of scale, although these were not always as direct as initially anticipated. Each parameter examined (bandwidth, contour and resolution) may potentially alter the outcome of northern woodland caribou habitat analyses.
We conclude that home range analyses that utilize the KDE method may be subject to MAUP by virtue the ability to modify the spatial dimensions of the units of analysis. As such, in habitat analyses using the KDE careful consideration should be given to the choice of bandwidth, UD contour and habitat variable resolution. / Graduate / 0366 / 0329 / spicym@uvic.ca
|
27 |
Statistical gas distribution modelling for mobile robot applicationsReggente, Matteo January 2014 (has links)
In this dissertation, we present and evaluate algorithms for statistical gas distribution modelling in mobile robot applications. We derive a representation of the gas distribution in natural environments using gas measurements collected with mobile robots. The algorithms fuse different sensors readings (gas, wind and location) to create 2D or 3D maps. Throughout this thesis, the Kernel DM+V algorithm plays a central role in modelling the gas distribution. The key idea is the spatial extrapolation of the gas measurement using a Gaussian kernel. The algorithm produces four maps: the weight map shows the density of the measurements; the confidence map shows areas in which the model is considered being trustful; the mean map represents the modelled gas distribution; the variance map represents the spatial structure of the variance of the mean estimate. The Kernel DM+V/W algorithm incorporates wind measurements in the computation of the models by modifying the shape of the Gaussian kernel according to the local wind direction and magnitude. The Kernel 3D-DM+V/W algorithm extends the previous algorithm to the third dimension using a tri-variate Gaussian kernel. Ground-truth evaluation is a critical issue for gas distribution modelling with mobile platforms. We propose two methods to evaluate gas distribution models. Firstly, we create a ground-truth gas distribution using a simulation environment, and we compare the models with this ground-truth gas distribution. Secondly, considering that a good model should explain the measurements and accurately predicts new ones, we evaluate the models according to their ability in inferring unseen gas concentrations. We evaluate the algorithms carrying out experiments in different environments. We start with a simulated environment and we end in urban applications, in which we integrated gas sensors on robots designed for urban hygiene. We found that typically the models that comprise wind information outperform the models that do not include the wind data.
|
28 |
associação de ascaris lumbricoides com a asma e sua distribuição espacial no bairro do pedregal – campina grande – pb / Programa de pós-graduação em medicina e saúdeBragagnoli, Gerson January 2013 (has links)
p. 1-91 / Submitted by Antonio Geraldo Couto Barreto (ppgms@ufba.br) on 2013-10-02T18:18:39Z
No. of bitstreams: 1
TESE FINAL TOTAL 19-09.pdf: 5006094 bytes, checksum: b3c2eaab92221a19848b3eaf975c9789 (MD5) / Approved for entry into archive by Patricia Barroso(pbarroso@ufba.br) on 2013-10-08T17:04:05Z (GMT) No. of bitstreams: 1
TESE FINAL TOTAL 19-09.pdf: 5006094 bytes, checksum: b3c2eaab92221a19848b3eaf975c9789 (MD5) / Made available in DSpace on 2013-10-08T17:04:05Z (GMT). No. of bitstreams: 1
TESE FINAL TOTAL 19-09.pdf: 5006094 bytes, checksum: b3c2eaab92221a19848b3eaf975c9789 (MD5)
Previous issue date: 2013 / OBJETIVO: Estudar a associação entre asma e ascaridíase e sua distribuição espacial em crianças de 2 a 10 anos de idade, no bairro do Pedregal – Campina Grande - PB
METODOLOGIA: Estudo transversal, no período de Janeiro e Novembro de 2007. Foram aplicados 1004 questionários padrão International Study of Asthma And Allergy in Childhood (ISAAC), e entregue o recipiente para coleta de material fecal. No exame parasitológico de fezes foi utilizado o método de Ritchie, para o cálculo da carga parasitária foi utilizado o método de Kato-Katz. A posição geográfica das residências foi registrada com um aparelho de GPS GARMIN. Foram utilizados o teste t e teste qui-quadrado (2) de Pearson e de Tendência Linear e Regressão Logística, com Odds Ratio (OR) e Intervalo de Confiança (IC). Para analise espacial o banco de dados e as coordenadas geográficas foram organizados no programa ArcGis 9.3, foi definida uma largura de banda de 50 metros e grade regular composta de 5 x 5 células.
RESULTADOS: Associações das cargas parasitárias leve e pesada foram significativas para todos os sintomas da asma (p<0,05). Associações significativas também foram encontradas entre infectados com gênero, escolaridade materna e media de idade; e entre infectados e asmáticos com a faixa etária, renda familiar, média de idade e portadores da co infecção A. lumbricoides e T. trichiura (p<0,05). A análise de Kernel da associação da infecção por A. lumbricoides com a asma, mostrou que não existe homogeneidade na distribuição dos casos, e os clusters tendem a se concentrar nas áreas mais altas do bairro, relativamente distantes da vala de esgoto a céu aberto que atravessa o bairro. A utilização da regressão logística permitiu identificar as variáveis preditoras da asma.
CONCLUSÃO: A carga parasitária leve da infecção por A. lumbricoides se apresentou como fator de proteção para a asma e mascara seus sintomas enquanto a carga parasitária alta, caracterizada como fator de risco, evidencia seus sintomas. As análises de Kernel (densidade e razão de risco) indicaram os locais de maior risco da contaminação por A. lumbricoides, e a regressão logística identificou as variáveis independentes estatisticamente significantes para os riscos de asma. / Salvador
|
29 |
ILLINOIS STATEWIDE HEALTHCARE AND EDUCATION MAPPINGKC, Binita 01 December 2010 (has links)
Illinois statewide infrastructure mapping provides basis for economic development of the state. As a part of infrastructure mapping, this study is focused on mapping healthcare and education services for Illinois. Over 4337 k-12 schools and 1331 hospitals and long term cares were used in analyzing healthcare and education services. Education service was measured as ratio of population to teacher and healthcare service as the ratio of population to bed. Both of these services were mapped using three mapping techniques including Choropleth mapping, Thiessen polygon, and Kernel Density Estimation. The mapping was also conducted at three scales including county, census tract, and ZIP code area. The obtained maps were compared by visual interpretation and statistical correlation analysis. Moreover, spatial pattern analysis of maps was conducted using global and local Moran's I, high/low clustering, and hotspot analysis methods. In addition, multivariate mapping was carried out to demonstrate the spatial distributions of multiple variables and their relationships. The results showed that both Choropleth mapping and Thiessen polygon methods resulted in the service levels that were homogeneous throughout the polygons and abruptly changed at the boundaries hence which ignored the cross boundary flow of people for healthcare and education services. In addition they do not reflect the distance decay of services. Kernel Density mapping quantified the continuous and variable healthcare and educational services and has the potential to provide more accurate estimates of healthcare and educational services. Moreover, the county scale maps are more reliable than the census tract and ZIP code area maps. In addition, multivariate map obtained by legend design that combined the values of multiple variables well demonstrated the spatial distributions of healthcare and education services along with per capita income and relationships between them. Overall, Morgan, Wayne, Mason, and Ford counties had higher services for both education and healthcare whereas Champaign, Johnson, and Perry had lower service levels of healthcare and education. Generally, cities and the areas close to cities have better healthcare and educational service than other areas because of higher per capita income. In addition to numbers of hospitals and schools, the healthcare and education service levels were also affected by populations and per capita income. Additionally, other factors may also have influence on the service levels but were not taken into account in this study because of limited time and data.
|
30 |
Time Series Online Empirical Bayesian Kernel Density Segmentation: Applications in Real Time Activity Recognition Using Smartphone AccelerometerNa, Shuang 28 June 2017 (has links)
Time series analysis has been explored by the researchers in many areas such, as statistical research, engineering applications, medical analysis, and finance study. To represent the data more efficiently, the mining process is supported by time series segmentation. Time series segmentation algorithm looks for the change points between two different patterns and develops a suitable model, depending on the data observed in such segment. Based on the issue of limited computing and storage capability, it is necessary to consider an adaptive and incremental online segmentation method. In this study, we propose an Online Empirical Bayesian Kernel Segmentation (OBKS), which combines Online Multivariate Kernel Density Estimation (OMKDE) and Online Empirical Bayesian Segmentation (OBS) algorithm. This innovative method considers Online Multivariate Kernel density as a predictive distribution derived by Online Empirical Bayesian segmentation instead of using posterior predictive distribution as a predictive distribution. The benefit of Online Multivariate Kernel Density Estimation is that it does not require the assumption of a pre-defined prior function, which makes the OMKDE more adaptive and adjustable than the posterior predictive distribution.
Human Activity Recognition (HAR) by smartphones with embedded sensors is a modern time series application applied in many areas, such as therapeutic applications and sensors of cars. The important procedures related to the HAR problem include classification, clustering, feature extraction, dimension reduction, and segmentation. Segmentation as the first step of HAR analysis attempts to represent the time interval more effectively and efficiently. The traditional segmentation method of HAR is to partition the time series into short and fixed length segments. However, these segments might not be long enough to capture the sufficient information for the entire activity time interval. In this research, we segment the observations of a whole activity as a whole interval using the Online Empirical Bayesian Kernel Segmentation algorithm as the first step. The smartphone with built-in accelerometer generates observations of these activities.
Based on the segmenting result, we introduce a two-layer random forest classification method. The first layer is used to identify the main group; the second layer is designed to analyze the subgroup from each core group. We evaluate the performance of our method based on six activities: sitting, standing, lying, walking, walking\_upstairs, and walking\_downstairs on 30 volunteers. If we want to create a machine that can detect walking\_upstairs and walking\_downstairs automatically, it requires more information and more detail that can generate more complicated features, since these two activities are very similar. Continuously, considering the real-time Activity Recognition application on the smartphones by the embedded accelerometers, the first layer classifies the activities as static and dynamic activities, the second layer classifies each main group into the sub-classes, depending on the first layer result. For the data collected, we get an overall accuracy of 91.4\% based on the six activities and an overall accuracy of 100\% based only on the dynamic activity (walking, walking\_upstairs, walking\_downstairs) and the static activity (sitting, standing, lying).
|
Page generated in 0.0652 seconds