1 |
Improving Estimation Accuracy of GPS-Based Arterial Travel Time Using K-Nearest Neighbors AlgorithmLi, Zheng, Li, Zheng January 2017 (has links)
Link travel time plays a significant role in traffic planning, traffic management and Advanced Traveler Information Systems (ATIS). A public probe vehicle dataset is a probe vehicle dataset that is collected from public people or public transport. The appearance of public probe vehicle datasets can support travel time collection at a large temporal and spatial scale but at a relatively low cost. Traditionally, link travel time is the aggregation of travel time by different movements. A recent study proved that link travel time of different movements is significantly different from their aggregation. However, there is still not a complete framework for estimating movement-based link travel time. In addition, probe vehicle datasets usually have a low penetration rate but no previous study has solved this problem.
To solve the problems above, this study proposed a detailed framework to estimate movement-based link travel time using a high sampling rate public probe vehicle dataset. Our study proposed a k-Nearest Neighbors (k-NN) regression method to increase travel time samples using incomplete trajectory. An incomplete trajectory was compared with historical complete trajectories and the link travel time of the incomplete trajectory was represented by its similar complete trajectories. The result of our study showed that the method can significantly increase link travel time samples but there are still limitations. In addition, our study investigated the performance of k-NN regression under different parameters and input data. The sensitivity analysis of k-NN algorithm showed that the algorithm performed differently under different parameters and input data. Our study suggests optimal parameters should be selected using a historical dataset before real-world application.
|
2 |
Classification Analytics in Functional Neuroimaging: Calibrating Signal Detection ParametersFisher, Julia Marie January 2015 (has links)
Classification analyses are a promising way to localize signal, especially scattered signal, in functional magnetic resonance imaging data. However, there is not yet a consensus on the most effective analysis pathway. We explore the efficacy of k-Nearest Neighbors classifiers on simulated functional magnetic resonance imaging data. We utilize a novel construction of the classification data. Additionally, we vary the spatial distribution of signal, the design matrix of the linear model used to construct the classification data, and the feature set available to the classifier. Results indicate that the k-Nearest Neighbors classifier is not sufficient under the current paradigm to adequately classify neural data and localize signal. Further exploration of the data using k-means clustering indicates that this is likely due in part to the amount of noise present in each data point. Suggestions are made for further research.
|
3 |
Nonparametric tests to detect relationship between variables in the presence of heteroscedastic treatment effectsTolos, Siti January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Haiyan Wang / Statistical tools to detect nonlinear relationship between variables are commonly needed in various practices. The first part of the dissertation presents a test of independence between a response variable, either discrete or continuous, and a continuous covariate after adjusting
for heteroscedastic treatment effects. The method first involves augmenting each pair of the data for all treatments with a fixed number of nearest neighbors as pseudo-replicates. A test statistic is then constructed by taking the difference of two quadratic forms. Using such differences eliminate the need to estimate any nonlinear regression function, reducing the
computational time. Although using a fixed number of nearest neighbors poses significant
difficulty in the inference compared to when the number of nearest neighbors goes to infinity, the parametric standardizing rate is obtained for the asymptotic distribution of the proposed test statistics. Numerical studies show that the new test procedure maintains the intended type I error rate and has robust power to detect nonlinear dependency in the presence of outliers. The second part of the dissertation discusses the theory and numerical studies for
testing the nonparametric effects of no covariate-treatment interaction and no main covariate based on the decomposition of the conditional mean of regression function that is potentially nonlinear. A similar test was discussed in Wang and Akritas (2006) for the effects defined through the decomposition of the conditional distribution function, but with the number of pseudo-replicates going to infinity. Consequently, their test statistics have slow convergence
rates and computational speeds. Both test limitations are overcome using new model and
tests. The last part of the dissertation develops theory and numerical studies to test for no covariate-treatment interaction, no simple covariate and no main covariate effects for cases when the number of factor levels and the number of covariate values are large.
|
4 |
Evaluating the use of neighborhoods for query dependent estimation of survival prognosis for oropharyngeal cancer patientsShay, Keegan P. 01 May 2019 (has links)
Oropharyngeal Cancer diagnoses make up three percent of all cancer diagnoses in the United States per year. Recently, there has been an increase in the incidence of HPV-associated oropharyngeal cancer, necessitating updates to prior survival estimation techniques, in order to properly account for this shift in demographic. Clinicians depend on accurate survival prognosis estimates in order to create successful treatment plans that aim to maximize patient life while minimizing adverse treatment side effects. Additionally, recent advances in data analysis have resulted in richer and more complex data, motivating the use of more advanced data analysis techniques. Incorporation of sophisticated survival analysis techniques can leverage complex data, from a variety of sources, resulting in improved personalized prediction. Current survival prognosis prediction methods often rely on summary statistics and underlying assumptions regarding distribution or overall risk.
We propose a k-nearest neighbor influenced approach for predicting oropharyngeal survival outcomes. We evaluate our approach for overall survival (OS), recurrence-free survival (RFS), and recurrence-free overall survival (RF+OS). We define two distance functions, not subject to the curse of dimensionality, in order to reconcile heterogeneous features with patient-to-patient similarity scores to produce a meaningful overall measure of distance. Using these distance functions, we obtain the k-nearest neighbors for each patient, forming neighborhoods of similar patients. We leverage these neighborhoods for prediction in two novel ensemble methods. The first ensemble method uses the nearest neighbors for each patient to combine globally trained predictions, weighted by their accuracies within a selected neighborhood. The second ensemble method combines Kaplan-Meier predictions from a variety of neighborhoods. Both proposed methods outperform an ensemble of standard global survival predictive models, with statistically significant calibration.
|
5 |
Classification of Twitter disaster data using a hybrid feature-instance adaptation approachMazloom, Reza January 1900 (has links)
Master of Science / Department of Computer Science / Doina Caragea / Huge amounts of data that are generated on social media during emergency situations are regarded as troves of critical information. The use of supervised machine learning techniques in the early stages of a disaster is challenged by the lack of labeled data for that particular disaster. Furthermore, supervised models trained on labeled data from a prior disaster may not produce accurate results.
To address these challenges, domain adaptation approaches, which learn models for predicting the target, by using unlabeled data from the target disaster in addition to labeled data from prior source disasters, can be used. However, the resulting models can still be affected by the variance between the target domain and the source domain.
In this context, we propose to use a hybrid feature-instance adaptation approach based on matrix factorization and the k-nearest neighbors algorithm, respectively. The proposed hybrid adaptation approach is used to select a subset of the source disaster data that is representative of the target disaster. The selected subset is subsequently used to learn accurate supervised or domain adaptation Naïve Bayes classifiers for the target disaster. In other words, this study focuses on transforming the existing source data to bring it closer to the target data, thus overcoming the domain variance which may prevent effective transfer of information from source to target. A combination of selective and transformative methods are used on instances and features, respectively. We show experimentally that the proposed approaches are effective in transferring information from source to target. Furthermore, we provide insights with respect to what types and combinations of selections/transformations result in more accurate models for the target.
|
6 |
Pattern Recognition applied to Continuous integration system.VANGALA, SHIVAKANTHREDDY January 2018 (has links)
Context: Thisthesis focuses on regression testing in the continuous integration environment which is integration testing that ensures that changes made in the new development code to thesoftware product do not introduce new faults to the software product. Continuous integration is software development practice which integrates all development, testing, and deployment activities. In continuous integration,regression testing is done by manually selecting and prioritizingtestcases from a larger set of testcases. The main challenge faced using manual testcases selection and prioritization is insome caseswhereneeded testcases are ignored in subset of selected testcasesbecause testers didn’t includethem manually while designing hourly cycle regression test suite for particular feature development in product. So, Ericsson, the company in which environment this thesis is conducted,aims at improvingtheirtestcase selection and prioritization in regression testing using pattern recognition. Objectives:This thesis study suggests prediction models using pattern recognition algorithms for predicting future testcases failures using historical data. This helpsto improve the present quality of continuous integration environment by selecting appropriate subset of testcases from larger set of testcases for regression testing. There exist several candidate pattern recognition algorithms that are promising for predicting testcase failures. Based on the characteristics of the data collected at Ericsson, suitable pattern recognition algorithms are selected and predictive models are built. Finally, two predictive models are evaluated and the best performing model is integrated into the continuous integration system. Methods:Experiment research method is chosen for this research because discovery of cause and effect relationships between dependent and independent variables can be used for the evaluation of the predictive model.The experiment is conducted in RStudio, which facilitates to train the predictive models using continuous integration historical data. The predictive ability of the algorithms is evaluated using prediction accuracy evaluation metrics. Results: After implementing two predictive models (neural networks & k-nearest means) using continuous integration data, neural networks achieved aprediction accuracy of 75.3%, k-nearest neighbor gave result 67.75%. Conclusions: This research investigated the feasibility of an adaptive and self-learning test machinery by pattern recognition in continuous integration environment to improve testcase selection and prioritization in regression testing. Neural networks have proved effective capability of predicting failure testcase by 75.3% over the k-nearest neighbors.Predictive model can only make continuous integration efficient only if it has 100% prediction capability, the prediction capability of the 75.3% will not make continuous integration system more efficient than present static testcase selection and prioritization as it has deficiency of lacking prediction 25%. So, this research can only conclude that neural networks at present has 75.3% prediction capability but in future when data availability is more,this may reach to 100% predictive capability. The present Ericsson continuous integration system needs to improve its data storage for historical data at present it can only store 30 days of historical data. The predictive models require large data to give good prediction. To support continuous integration at present Ericsson is using jenkins automation server, there are other automation servers like Team city, Travis CI, Go CD, Circle CI which can store data more than 30 days using them will mitigate the problem of data storage.
|
7 |
Automatic Pain Assessment from Infants’ Crying SoundsPai, Chih-Yun 01 November 2016 (has links)
Crying is infants utilize to express their emotional state. It provides the parents and the nurses a criterion to understand infants’ physiology state. Many researchers have analyzed infants’ crying sounds to diagnose specific diseases or define the reasons for crying. This thesis presents an automatic crying level assessment system to classify infants’ crying sounds that have been recorded under realistic conditions in the Neonatal Intensive Care Unit (NICU) as whimpering or vigorous crying. To analyze the crying signal, Welch’s method and Linear Predictive Coding (LPC) are used to extract spectral features; the average and the standard deviation of the frequency signal and the maximum power spectral density are the other spectral features which are used in classification. For classification, three state-of-the-art classifiers, namely K-nearest Neighbors, Random Forests, and Least Squares Support Vector Machine are tested in this work, and the experimental result achieves the highest accuracy in classifying whimper and vigorous crying using the clean dataset is 90%, which is sampled with 10 seconds before scoring and 5 seconds after scoring and uses K-nearest neighbors as the classifier.
|
8 |
Disc : Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs / Disc : Approximativ närmaste grannsökning med ellipsoider för fotonmappning på GPU:erBergholm, Marcus, Kronvall, Viktor January 2016 (has links)
Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high-performance computing for general-purpose applications. The K-Nearest Neighbors problem is widely used in applications ranging from classification to gathering of photons in the Photon Mapping algorithm. Using the euclidean distance measure when gathering photons can cause false bleeding of colors between surfaces. Ellipsoidical search boundaries for photon gathering are shown to reduce artifacts due to this false bleeding. Shifted Sorting has been found to yield high performance on GPUs while simultaneously retaining a high approximation rate. This study presents an algorithm for approximatively solving the K-Nearest Neighbors problem modified to use a distance measure creating an ellipsoidical search boundary. The ellipsoidical search boundary is used to alleviate the issue of false bleeding of colors between surfaces in Photon Mapping. The Approximative K-Nearest Neighbors algorithm presented is a modification of the Shifted Sorting algorithm. The algorithm is found to be highly parallelizable and performs to a factor of 86% queries processed per millisecond compared to a reference implementation using spherical search boundaries implied by the euclidean distance. The rate of compression from spherical to ellipsoidical search boundary is appropriately chosen in the range 3.0 to 7.0. The algorithm is found to scale well in respect to increases in both number of data points and number of query points. / Grafikprocessorer (GPU-er) har på senare tid möjliggjort högprestandaberäkningar till låga kostnader för generella applikationer. K-Nearest Neighbors problemet har vida applikationsområden, från klassifikation inom maskininlärning till insamlande av fotoner i Photon Mapping för rendering av tredimensionella scener. Användning av euklidiska avstånd vid insamling av fotoner kan leda till en felaktig bladning av färger mellan ytor. Ellipsoidiska sökområden vid fotoninsamling har visats reducera artefakter oraskade av denna typ av felaktiga färgutblandning. Shifted Sorting har visats ge hög prestanda på GPU-er utan att förlora kvalitet av approximationsgrad. Denna rapport undersöker hur den approximativa varianten av K-Nearest Neighborsalgoritmen med Shifted Sorting presterar på GPU-er med avståndsmåttet modifierat sådant att ett ellipsoidiskt sökområde bildas. Algoritmen används för att reduceras problemet av felaktig blanding av färg i Photon Mapping. Algoritmen visas vara mycket parallelliserbar och presterar till en grad av 86% behandlade sökpunkter per millisekund i jämförelse med en referensimplementation som använder sfäriska sökområden. Kompressionsgraden längs sökpunktens ytnormal väljs fördelaktligen till ett värde i intervallet 3,0 till 7,0. Algoritmen visas skala väl med avseende på både ökningar i antal data punkter och antal sökpunkter.
|
9 |
Discovery of Outlier Points and Dense Regions in Large Data-Sets Using Spark EnvironmentNadella, Pravallika 04 October 2021 (has links)
No description available.
|
10 |
Predicting Bridge Deck Condition Ratings Using K-Nearest Neighbors Algorithm for National Bridge InventoryPallepogu, Avinash January 2022 (has links)
No description available.
|
Page generated in 0.055 seconds