Spelling suggestions: "subject:"nearest neighbor algorithm"" "subject:"nearest weighbor algorithm""
1 |
Identification of Driving Styles in BusesKarginova, Nadezda January 2010 (has links)
<p>It is important to detect faults in bus details at an early stage. Because the driving style affects the breakdown of different details in the bus, identification of the driving style is important to minimize the number of failures in buses.</p><p>The identification of the driving style of the driver was based on the input data which contained examples of the driving runs of each class. K-nearest neighbor and neural networks algorithms were used. Different models were tested.</p><p>It was shown that the results depend on the selected driving runs. A hypothesis was suggested that the examples from different driving runs have different parameters which affect the results of the classification.</p><p>The best results were achieved by using a subset of variables chosen with help of the forward feature selection procedure. The percent of correct classifications is about 89-90 % for the k-nearest neighbor algorithm and 88-93 % for the neural networks.</p><p>Feature selection allowed a significant improvement in the results of the k-nearest neighbor algorithm and in the results of the neural networks algorithm received for the case when the training and testing data sets were selected from the different driving runs. On the other hand, feature selection did not affect the results received with the neural networks for the case when the training and testing data sets were selected from the same driving runs.</p><p>Another way to improve the results is to use smoothing. Computing the average class among a number of consequent examples allowed achieving a decrease in the error.</p>
|
2 |
Identification of Driving Styles in BusesKarginova, Nadezda January 2010 (has links)
It is important to detect faults in bus details at an early stage. Because the driving style affects the breakdown of different details in the bus, identification of the driving style is important to minimize the number of failures in buses. The identification of the driving style of the driver was based on the input data which contained examples of the driving runs of each class. K-nearest neighbor and neural networks algorithms were used. Different models were tested. It was shown that the results depend on the selected driving runs. A hypothesis was suggested that the examples from different driving runs have different parameters which affect the results of the classification. The best results were achieved by using a subset of variables chosen with help of the forward feature selection procedure. The percent of correct classifications is about 89-90 % for the k-nearest neighbor algorithm and 88-93 % for the neural networks. Feature selection allowed a significant improvement in the results of the k-nearest neighbor algorithm and in the results of the neural networks algorithm received for the case when the training and testing data sets were selected from the different driving runs. On the other hand, feature selection did not affect the results received with the neural networks for the case when the training and testing data sets were selected from the same driving runs. Another way to improve the results is to use smoothing. Computing the average class among a number of consequent examples allowed achieving a decrease in the error.
|
3 |
Machine Learning for Malware Detection in Network TrafficOmopintemi, A.H., Ghafir, Ibrahim, Eltanani, S., Kabir, Sohag, Lefoane, Moemedi 19 December 2023 (has links)
No / Developing advanced and efficient malware detection systems is
becoming significant in light of the growing threat landscape in cybersecurity. This work aims to tackle the enduring problem of identifying malware and protecting digital assets from cyber-attacks.
Conventional methods frequently prove ineffective in adjusting
to the ever-evolving field of harmful activity. As such, novel approaches that improve precision while simultaneously taking into
account the ever-changing landscape of modern cybersecurity problems are needed. To address this problem this research focuses on
the detection of malware in network traffic. This work proposes
a machine-learning-based approach for malware detection, with
particular attention to the Random Forest (RF), Support Vector Machine (SVM), and Adaboost algorithms. In this paper, the model’s
performance was evaluated using an assessment matrix. Included
the Accuracy (AC) for overall performance, Precision (PC) for positive predicted values, Recall Score (RS) for genuine positives, and
the F1 Score (SC) for a balanced viewpoint. A performance comparison has been performed and the results reveal that the built model
utilizing Adaboost has the best performance. The TPR for the three
classifiers performs over 97% and the FPR performs < 4% for each of
the classifiers. The created model in this paper has the potential to
help organizations or experts anticipate and handle malware. The
proposed model can be used to make forecasts and provide management solutions in the network’s everyday operational activities.
|
4 |
Nearest Neighbor Foreign Exchange Rate Forecasting with Mahalanobis DistancePathirana, Vindya Kumari 01 January 2015 (has links)
Foreign exchange (FX) rate forecasting has been a challenging area of study in the past. Various linear and nonlinear methods have been used to forecast FX rates. As the currency data are nonlinear and highly correlated, forecasting through nonlinear dynamical systems is becoming more relevant. The nearest neighbor (NN) algorithm is one of the most commonly used nonlinear pattern recognition and forecasting methods that outperforms the available linear forecasting methods for the high frequency foreign exchange data. The basic idea behind the NN is to capture the local behavior of the data by selecting the instances having similar dynamic behavior. The most relevant k number of histories to the present dynamical structure are the only past values used to predict the future. Due to this reason, NN algorithm is also known as the k-nearest neighbor algorithm (k-NN). Here k represents the number of chosen neighbors.
In the k-nearest neighbor forecasting procedure, similar instances are captured through a distance function. Since the forecasts completely depend on the chosen nearest neighbors, the distance plays a key role in the k-NN algorithm. By choosing an appropriate distance, we can improve the performance of the algorithm significantly. The most commonly used distance for k-NN forecasting in the past was the Euclidean distance. Due to possible correlation among vectors at different time frames, distances based on deterministic vectors, such as Euclidean, are not very appropriate when applying for foreign exchange data. Since Mahalanobis distance captures the correlations, we suggest using this distance in the selection of neighbors.
In the present study, we used five different foreign currencies, which are among the most traded currencies, to compare the performances of the k-NN algorithm with traditional Euclidean and Absolute distances to performances with the proposed Mahalanobis distance. The performances were compared in two ways: (i) forecast accuracy and (ii) transforming their forecasts in to a more effective technical trading rule. The results were obtained with real FX trading data, and the results showed that the method introduced in this work outperforms the other popular methods.
Furthermore, we conducted a thorough investigation of optimal parameter choice with different distance measures. We adopted the concept of distance based weighting to the NN and compared the performances with traditional unweighted NN algorithm based forecasting.
Time series forecasting methods, such as Auto regressive integrated moving average process (ARIMA), are widely used in many ares of time series as a forecasting technique. We compared the performances of proposed Mahalanobis distance based k-NN forecasting procedure with the traditional general ARIM- based forecasting algorithm. In this case the forecasts were also transformed into a technical trading strategy to create buy and sell signals. The two methods were evaluated for their forecasting accuracy and trading performances.
Multi-step ahead forecasting is an important aspect of time series forecasting. Even though many researchers claim that the k-Nearest Neighbor forecasting procedure outperforms the linear forecasting methods for financial time series data, and the available work in the literature supports this claim with one step ahead forecasting. One of our goals in this work was to improve FX trading with multi-step ahead forecasting. A popular multi-step ahead forecasting strategy was adopted in our work to obtain more than one day ahead forecasts. We performed a comparative study on the performance of single step ahead trading strategy and multi-step ahead trading strategy by using five foreign currency data with Mahalanobis distance based k-nearest neighbor algorithm.
|
5 |
Classification of Genotype and Age by Spatial Aspects of RPE Cell MorphologyBoring, Michael 12 August 2014 (has links)
Age related macular degeneration (AMD) is a public health concern in an aging society. The retinal pigment epithelium (RPE) layer of the eye is a principal site of pathogenesis for AMD. Morphological characteristics of the cells in the RPE layer can be used to discriminate age and disease status of individuals. In this thesis three genotypes of mice of various ages are used to study the predictive abilities of these characteristics. The disease state is represented by two mutant genotypes and the healthy state by the wild-type. Classification analysis is applied to the RPE morphology from the different spatial regions of the RPE layer. Variable reduction is accomplished by principal component analysis (PCA) and classification analysis by the k-nearest neighbor (k-NN) algorithm. In this way the differential ability of the spatial regions to predict age and disease status by cellular variables is explored.
|
6 |
A Data Mining Framework To Detect Tariff Code Circumvention In Turkish Customs DatabaseBastabak, Burcu 01 September 2012 (has links) (PDF)
Customs and foreign trade regulations are made to regulate import and export activities. The majority of these regulations are applied on import procedures. The country of origin and the tariff code become important when determining the tax amount of the merchandise in importation.
Anti-dumping duty is defined as a financial penalty, published by the Ministry of Economy, enforced for suspiciously low priced imports in order to protect the local industry from unfair competition. It is accrued according to tariff code and the country of origin. To avoid such an obligation in order to not to pay tax, a tariff code that is different from the original tariff code may be declared on the customs declaration which is called as " / Tariff Code Circumvention" / . To identify such misdeclarations, a physical examination of the merchandise is required. However, with limited personnel resources, the physical examination of all imported merchandise is not possible.
In this study, a data mining framework is developed on Turkish customs database in order to detect &ldquo / Tariff Code Circumvention&rdquo / . For this purpose, four types of products, which are the most circumvented goods in the Turkish customs, have been chosen. First, with the help of Risk Analysis Office, the significant features are identified. Then, Infogain algorithm is used for ranking these features. Finally, KNN algorithm is applied on the Turkish customs database in order to identify the circumvented goods automatically. The results show that the framework is able to find such circumvented goods successfully.
|
7 |
Učení založené na instancích / Instance based learningMartikán, Miroslav January 2009 (has links)
This thesis is specialized in instance based learning algorithms. Main goal is to create an application for educational purposes. There are instance based learning algorithms (IBL), nearest neighbor algorithms and kd-trees described theoretically in this thesis. Practical part is about making of tutorial application. Application can generate data, classified them with nearest neighbor algorithm and is able of IB1, IB2 and IB3 algorithm testing.
|
8 |
TOP-K AND SKYLINE QUERY PROCESSING OVER RELATIONAL DATABASESamara, Rafat January 2012 (has links)
Top-k and Skyline queries are a long study topic in database and information retrieval communities and they are two popular operations for preference retrieval. Top-k query returns a subset of the most relevant answers instead of all answers. Efficient top-k processing retrieves the k objects that have the highest overall score. In this paper, some algorithms that are used as a technique for efficient top-k processing for different scenarios have been represented. A framework based on existing algorithms with considering based cost optimization that works for these scenarios has been presented. This framework will be used when the user can determine the user ranking function. A real life scenario has been applied on this framework step by step. Skyline query returns a set of points that are not dominated (a record x dominates another record y if x is as good as y in all attributes and strictly better in at least one attribute) by other points in the given datasets. In this paper, some algorithms that are used for evaluating the skyline query have been introduced. One of the problems in the skyline query which is called curse of dimensionality has been presented. A new strategy that based on the skyline existing algorithms, skyline frequency and the binary tree strategy which gives a good solution for this problem has been presented. This new strategy will be used when the user cannot determine the user ranking function. A real life scenario is presented which apply this strategy step by step. Finally, the advantages of the top-k query have been applied on the skyline query in order to have a quickly and efficient retrieving results.
|
9 |
Nächste-Nachbar basierte Methoden in der nichtlinearen Zeitreihenanalyse / Nearest-neighbor based methods for nonlinear time-series analysisMerkwirth, Christian 02 November 2000 (has links)
No description available.
|
Page generated in 0.0574 seconds