Spelling suggestions: "subject:"knearest"" "subject:"nearest""
21 |
Evaluating the use of neighborhoods for query dependent estimation of survival prognosis for oropharyngeal cancer patientsShay, Keegan P. 01 May 2019 (has links)
Oropharyngeal Cancer diagnoses make up three percent of all cancer diagnoses in the United States per year. Recently, there has been an increase in the incidence of HPV-associated oropharyngeal cancer, necessitating updates to prior survival estimation techniques, in order to properly account for this shift in demographic. Clinicians depend on accurate survival prognosis estimates in order to create successful treatment plans that aim to maximize patient life while minimizing adverse treatment side effects. Additionally, recent advances in data analysis have resulted in richer and more complex data, motivating the use of more advanced data analysis techniques. Incorporation of sophisticated survival analysis techniques can leverage complex data, from a variety of sources, resulting in improved personalized prediction. Current survival prognosis prediction methods often rely on summary statistics and underlying assumptions regarding distribution or overall risk.
We propose a k-nearest neighbor influenced approach for predicting oropharyngeal survival outcomes. We evaluate our approach for overall survival (OS), recurrence-free survival (RFS), and recurrence-free overall survival (RF+OS). We define two distance functions, not subject to the curse of dimensionality, in order to reconcile heterogeneous features with patient-to-patient similarity scores to produce a meaningful overall measure of distance. Using these distance functions, we obtain the k-nearest neighbors for each patient, forming neighborhoods of similar patients. We leverage these neighborhoods for prediction in two novel ensemble methods. The first ensemble method uses the nearest neighbors for each patient to combine globally trained predictions, weighted by their accuracies within a selected neighborhood. The second ensemble method combines Kaplan-Meier predictions from a variety of neighborhoods. Both proposed methods outperform an ensemble of standard global survival predictive models, with statistically significant calibration.
|
22 |
Brain Tumor Target Volume Determination for Radiation Therapy Treatment Planning Through the Use of Automated MRI SegmentationMazzara, Gloria Patrika 27 February 2004 (has links)
Radiation therapy seeks to effectively irradiate the tumor cells while minimizing the dose to adjacent normal cells. Prior research found that the low success rates for treating brain tumors would be improved with higher radiation doses to the tumor area. This is feasible only if the target volume can be precisely identified. However, the definition of tumor volume is still based on time-intensive, highly subjective manual outlining by radiation oncologists. In this study the effectiveness of two automated Magnetic Resonance Imaging (MRI) segmentation methods, k-Nearest Neighbors (kNN) and Knowledge-Guided (KG), in determining the Gross Tumor Volume (GTV) of brain tumors for use in radiation therapy was assessed. Three criteria were applied: accuracy of the contours; quality of the resulting treatment plan in terms of dose to the tumor; and a novel treatment plan evaluation technique based on post-treatment images.
The kNN method was able to segment all cases while the KG method was limited to enhancing tumors and gliomas with clear enhancing edges. Various software applications were developed to create a closed smooth contour that encompassed the tumor pixels from the segmentations and to integrate these results into the treatment planning software. A novel, probabilistic measurement of accuracy was introduced to compare the agreement of the segmentation methods with the weighted average physician volume. Both computer methods under-segment the tumor volume when compared with the physicians but performed within the variability of manual contouring (28% plus/minus12% for inter-operator variability).
Computer segmentations were modified vertically to compensate for their under-segmentation. When comparing radiation treatment plans designed from physician-defined tumor volumes with treatment plans developed from the modified segmentation results, the reference target volume was irradiated within the same level of conformity. Analysis of the plans based on post- treatment MRI showed that the segmentation plans provided similar dose coverage to areas being treated by the original treatment plans.
This research demonstrates that computer segmentations provide a feasible route to automatic target volume definition. Because of the lower variability and greater efficiency of the automated techniques, their use could lead to more precise plans and better prognosis for brain tumor patients.
|
23 |
Efficient Kernel Methods for Statistical DetectionSu, Wanhua 20 March 2008 (has links)
This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of
new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection.
We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step.
Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines.
One drawback of the existing LAGO is that it only
provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO
provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO.
|
24 |
Efficient Kernel Methods for Statistical DetectionSu, Wanhua 20 March 2008 (has links)
This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of
new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection.
We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step.
Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines.
One drawback of the existing LAGO is that it only
provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO
provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO.
|
25 |
Improving WiFi positioning through the use of successive in-sequence signal strength samplesHallström, Per, Dellrup, Per January 2006 (has links)
As portable computers and wireless networks are becoming ubiquitous, it is natural to consider the user’s position as yet another aspect to take into account when providing services that are tailored to meet the needs of the consumers. Location aware systems could guide persons through buildings, to a particular bookshelf in a library or assist in a vast variety of other applications that can benefit from knowing the user’s position. In indoor positioning systems, the most commonly used method for determining the location is to collect samples of the strength of the received signal from each base station that is audible at the client’s position and then pass the signal strength data on to a positioning server that has been previously fed with example signal strength data from a set of reference points where the position is known. From this set of reference points, the positioning server can interpolate the client’s current location by comparing the signal strength data it has collected with the signal strength data associated with every reference point. Our work proposes the use of multiple successive received signal strength samples in order to capture periodic signal strength variations that are the result of effects such as multi-path propagation, reflections and other types of radio interference. We believe that, by capturing these variations, it is possible to more easily identify a particular point; this is due to the fact that the signal strength fluctuations should be rather constant at every position, since they are the result of for example reflections on the fixed surfaces of the building’s interior. For the purpose of investigating our assumptions, we conducted measurements at a site at Växjö university, where we collected signal strength samples at known points. With the data collected, we performed two different experiments: one with a neural network and one where the k-nearest-neighbor method was used for position approximation. For each of the methods, we performed the same set of tests with single signal strength samples and with multiple successive signal strength samples, to evaluate their respective performances. We concluded that the k-nearest-neighbor method does not seem to benefit from multiple successive signal strength samples, at least not in our setup, compared to when using single signal strength samples. However, the neural network performed about 17% better when multiple successive signal strength samples were used.
|
26 |
A Hilbert Curve-Based Algorithm for Order-Sensitive Moving KNN QueriesFeng, Fei-Chung 11 July 2012 (has links)
¡@¡@Due to wireless communication technologies, positioning technologies, and mobile computing develop quickly, mobile services are becoming practical and important on big spatiotemporal databases management. Mobile service users move only inside a spatial space, e:g: a country. They often issue the K Nearest Neighbor (kNN) query to obtain data objects reachable through the spatial database. The challenge problem of mobile services is how to efficiently answer the data objects which users interest to the corresponding mobile users. One type of kNN query problems is the order-sensitive moving kNN (order-sensitive MkNN) query problem. In the order-sensitive MkNN query problem, the query point is dynamic and unpredictable, the kNN answers should be responded in real time and sorted by the distance in the ascending order. Therefore, how to respond the kNN answers effectively, incrementally and correctly is an important issue. Nutanong et al: have proposed the V*-kNN algorithm to process the order-sensitive MkNN query. The V*-kNN algorithm uses their the V*-diagram algorithm to generate the safe region. It also uses the Incremental Rank Updates algorithm (IRU) to handle the events while the query point passing the bisectors or the boundary of the safe region. However, the V*-kNN algorithm uses the BF-kNN algorithm to retrieve NNs, which is non-incremental. This makes the search time increase while the density of the object increases. Moreover, they do not consider the situation that there are multiple objects at the same order, and the situation that there are multiple events happen in a single step. These situations may cause that the kNN answers are incorrect. Therefore, in this thesis, we propose the Hilbert curve-based kNN algorithm (HC-kNN) algorithm to process the ordersensitive MkNN query. The HC-kNN algorithm can handle the situation that there are multiple events happen in a single step. We also propose new data structure of the kNN answers. Next, we propose the Intersection of Perpendicular Bisectors algorithm (IPB) in order to handle order update events of the kNN answers. The IPB algorithm handles the situation which there are multiple objects at the same order. Finally, based on the Hilbert curve index, we propose the ONHC-kNN algorithm to get NNs incrementally and to generate the safe region. The safe region will not be affected while the density of the object increases. The safe region of our algorithm is larger than that of the V*-kNN algorithm. From our simulation result, we show that the HC-kNN algorithm provides better performance than the V*-kNN algorithm.
|
27 |
The Incremental Benefits of the Nearest Neighbor Forecast of U.S. Energy Commodity PricesKudoyan, Olga 2010 December 1900 (has links)
This thesis compares the simple Autoregressive (AR) model against the k-
Nearest Neighbor (k-NN) model to make a point forecast of five energy commodity
prices. Those commodities are natural gas, heating oil, gasoline, ethanol, and crude oil.
The data for the commodities are monthly and, for each commodity, two-thirds of the
data are used for an in-sample forecast, and the remaining one-third of the data are used
to perform an out-of-sample forecast. Mean Absolute Error (MAE) and Root Mean
Squared Error (RMSE) are used to compare the two forecasts. The results showed that
one method is superior by one measure but inferior by another. Although the differences
of the two models are minimal, it is up to a decision maker as to which model to choose.
The Diebold-Mariano (DM) test was performed to test the relative accuracy of
the models. For all five commodities, the results failed to reject the null hypothesis
indicating that both models are equally accurate.
|
28 |
Improving WiFi positioning through the use of successive in-sequence signal strength samplesHallström, Per, Dellrup, Per January 2006 (has links)
<p>As portable computers and wireless networks are becoming ubiquitous, it is natural to consider the user’s position as yet another aspect to take into account when providing services that are tailored to meet the needs of the consumers. Location aware systems could guide persons through buildings, to a particular bookshelf in a library or assist in a vast variety of other applications that can benefit from knowing the user’s position.</p><p>In indoor positioning systems, the most commonly used method for determining the location is to collect samples of the strength of the received signal from each base station that is audible at the client’s position and then pass the signal strength data on to a positioning server that has been previously fed with example signal strength data from a set of reference points where the position is known. From this set of reference points, the positioning server can interpolate the client’s current location by comparing the signal strength data it has collected with the signal strength data associated with every reference point.</p><p>Our work proposes the use of multiple successive received signal strength samples in order to capture periodic signal strength variations that are the result of effects such as multi-path propagation, reflections and other types of radio interference. We believe that, by capturing these variations, it is possible to more easily identify a particular point; this is due to the fact that the signal strength fluctuations should be rather constant at every position, since they are the result of for example reflections on the fixed surfaces of the building’s interior.</p><p>For the purpose of investigating our assumptions, we conducted measurements at a site at Växjö university, where we collected signal strength samples at known points. With the data collected, we performed two different experiments: one with a neural network and one where the k-nearest-neighbor method was used for position approximation. For each of the methods, we performed the same set of tests with single signal strength samples and with multiple successive signal strength samples, to evaluate their respective performances.</p><p>We concluded that the k-nearest-neighbor method does not seem to benefit from multiple successive signal strength samples, at least not in our setup, compared to when using single signal strength samples. However, the neural network performed about 17% better when multiple successive signal strength samples were used.</p>
|
29 |
A scalable metric learning based voting method for expression recognitionWan, Shaohua 09 October 2013 (has links)
In this research work, we propose a facial expression classification method using metric learning-based k-nearest neighbor voting. To achieve accurate classification of a facial expression from frontal face images, we first learn a distance metric structure from training data that characterizes the feature space pattern, then use this metric to retrieve the nearest neighbors from the training dataset, and finally output the classification decision accordingly. An expression is represented as a fusion of face shape and texture. This representation is based on registering a face image with a landmarking shape model and extracting Gabor features from local patches around landmarks. This type of representation achieves robustness and effectiveness by using an ensemble of local patch feature detectors at a global shape level. A naive implementation of the metric learning-based k-nearest neighbor would incur a time complexity proportional to the size of the training dataset, which precludes this method being used with enormous datasets. To scale to potential larger databases, a similar approach to that in [24] is used to achieve an approximate yet efficient ML-based kNN voting based on Locality Sensitive Hashing (LSH). A query example is directly hashed to the bucket of a pre-computed hash table where candidate nearest neighbors can be found, and there is no need to search the entire database for nearest neighbors. Experimental results on the Cohn-Kanade database and the Moving Faces and People database show that both ML-based kNN voting and its LSH approximation outperform the state-of-the-art, demonstrating the superiority and scalability of our method. / text
|
30 |
ENABLING HYDROLOGICAL INTERPRETATION OF MONTHLY TO SEASONAL PRECIPITATION FORECASTS IN THE CORE NORTH AMERICAN MONSOON REGIONMaitaria, Kazungu January 2009 (has links)
The aim of the research undertaken in this dissertation was to use medium-range to seasonal precipitation forecasts for hydrologic applications for catchments in the core North American Monsoon (NAM) region. To this end, it was necessary to develop a better understanding of the physical and statistical relationships between runoff processes and the temporal statistics of rainfall. To achieve this goal, development of statistically downscaled estimates of warm season precipitation over the core region of the North American Monsoon Experiment (NAME) were developed. Currently, NAM precipitation is poorly predicted on local and regional scales by Global Circulation Models (GCMs). The downscaling technique used here, the K-Nearest Neighbor (KNN) model, combines information from retrospective GCM forecasts with simultaneous historical observations to infer statistical relationships between the low-resolution GCM fields and the locally-observed precipitation records. The stochastic nature of monsoon rainfall presents significant challenges for downscaling efforts and, therefore, necessitate a regionalization and an ensemble or probabilistic-based approach to quantitative precipitation forecasting. It was found that regionalization of the precipitation climatology prior to downscaling using KNN offered significant advantages in terms of improved skill scores.Selected output variables from retrospective ensemble runs of the National Centers for Environmental Predictions medium-range forecast (MRF) model were fed into the KNN downscaling model. The quality of the downscaled precipitation forecasts was evaluated in terms of a standard suite of ensemble verification metrics. This study represents the first time the KNN model has been successfully applied within a warm season convective climate regime and shown to produce skillful and reliable ensemble forecasts of daily precipitation out to a lead time of four to six days, depending on the forecast month.Knowledge of the behavior of the regional hydrologic systems in NAM was transferred into a modeling framework aimed at improving intra-seasonal hydrologic predictions. To this end, a robust lumped-parameter computational model of intermediate conceptual complexity was calibrated and applied to generate streamflow in three unregulated test basins in the core region of the NAM. The modeled response to different time-accumulated KNN-generated precipitation forcing was investigated. Although the model had some difficulty in accurately simulating hydrologic fluxes on the basis of Hortonian runoff principles only, the preliminary results achieved from this study are encouraging. The primary and most novel finding from this study is an improved predictability of the NAM system using state-of-the-art ensemble forecasting systems. Additionally, this research significantly enhanced the utility of the MRF ensemble forecasts and made them reliable for regional hydrologic applications. Finally, monthly streamflow simulations (from an ensemble-based approach) have been demonstrated. Estimated ensemble forecasts provide quantitative estimates of uncertainty associated with our model forecasts.
|
Page generated in 0.0512 seconds