Spelling suggestions: "subject:"nearestneighbor"" "subject:"knearestneighbors""
1 |
Mutual k Nearest Neighbor based ClassifierGupta, Nidhi January 2010 (has links)
No description available.
|
2 |
Density Based Clustering using Mutual K-Nearest NeighborsDixit, Siddharth January 2015 (has links)
No description available.
|
3 |
Estudo da influência de diversas medidas de similaridade na previsão de séries temporais utilizando o algoritmo KNN-TSP / Study of the influence of similarity measures in Time Series Prediction with the kNN-TSP algorithmAikes Junior, Jorge 11 April 2012 (has links)
Made available in DSpace on 2017-07-10T17:11:50Z (GMT). No. of bitstreams: 1
JORGE AIKES JUNIOR.PDF: 2050278 bytes, checksum: f5bae18bbcb7465240488c45b2c813e7 (MD5)
Previous issue date: 2012-04-11 / Time series can be understood as any set of observations which are time ordered. Among the many possible tasks appliable to temporal data, one that has attracted increasing interest, due to its various applications, is the time series forecasting. The k-Nearest Neighbor - Time Series Prediction (kNN-TSP) algorithm is a non-parametric method for forecasting time series. One of its advantages, is its easiness application when compared to parametric methods. Even though its easier to define kNN-TSP s parameters, some issues remain opened. This research is focused on the study of one of these parameters: the similarity measure. This parameter was empirically evaluated using various similarity measures in a large set of time series, including artificial series with seasonal and chaotic characteristics, and several real world time series. It was also carried out a case study comparing the predictive accuracy of the kNN-TSP algorithm with the Moving Average (MA), univariate Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and multivariate SARIMA methods in a time series of a Korean s hospital daily patients flow in the Emergency Department. This work also proposes an approach to the development of a hybrid similarity measure which combines characteristics from several measures. The research s result demonstrated that the Lp Norm s measures have an advantage over other measures evaluated, due to its lower computational cost and for providing, in general, greater accuracy in temporal data forecasting using the kNN-TSP algorithm. Although the literature in general adopts the Euclidean similarity measure to calculate de similarity between time series, the Manhattan s distance can be considered an interesting candidate for defining similarity, due to the absence of statistical significant difference and to its lower computational cost when compared to the Euclidian measure. The measure proposed in this work does not show significant results, but it is promising for further research. Regarding the case study, the kNN-TSP algorithm with only the similarity measure parameter optimized achieves a considerably lower error than the MA s best configuration, and a slightly greater error than the univariate e multivariate SARIMA s optimal settings presenting less than one percent of difference. / Séries temporais podem ser entendidas como qualquer conjunto de observações que se encontram ordenadas no tempo. Dentre as várias tarefas possíveis com dados temporais, uma que tem atraído crescente interesse, devido a suas várias aplicações, é a previsão de séries temporais. O algoritmo k-Nearest Neighbor - Time Series Prediction (kNN-TSP) é um método não-paramétrico de previsão de séries temporais que apresenta como uma de suas vantagens a facilidade de aplicação, quando comparado aos métodos paramétricos. Apesar da maior facilidade na determinação de seus parâmetros, algumas questões relacionadas continuam em aberto. Este trabalho está focado no estudo de um desses parâmetros: a medida de similaridade. Esse parâmetro foi avaliado empiricamente utilizando diversas medidas de similaridade em um grande conjunto de séries temporais que incluem séries artificiais, com características sazonais e caóticas, e várias séries reais. Foi realizado também um estudo de caso comparativo entre a precisão da previsão do algoritmo kNN-TSP e a dos métodos de Médias Móveis (MA), Auto-regressivos de Médias Móveis Integrados Sazonais (SARIMA) univariado e SARIMA multivariado, em uma série de fluxo diário de pacientes na Área de Emergência de um hospital coreano. Neste trabalho é ainda proposta uma abordagem para o desenvolvimento de uma medida de similaridade híbrida, que combine características de várias medidas. Os resultados obtidos neste trabalho demonstram que as medidas da Norma Lp apresentam vantagem sobre as demais medidas avaliadas, devido ao seu menor custo computacional e por apresentar, em geral, maior precisão na previsão de dados temporais utilizando o algoritmo kNN-TSP. Apesar de na literatura, em geral, a medida Euclidiana ser adotada como medida de similaridade, a medida Manhattan pode ser considerada candidata interessante para definir a similaridade entre séries temporais, devido a não apresentar diferença estatisticamente significativa com a medida Euclidiana e possuir menor custo computacional. A medida proposta neste trabalho, não apresenta resultados significantes, mas apresenta-se promissora para novas pesquisas. Com relação ao estudo de caso, o algoritmo kNN-TSP, com apenas o parâmetro de medida de similaridade otimizado, alcança um erro consideravelmente inferior a melhor configuração com MA, e pouco maior que as melhores configurações dos métodos SARIMA univariado e SARIMA multivariado, sendo essa diferença inferior a um por cento.
|
4 |
Identification of Driving Styles in BusesKarginova, Nadezda January 2010 (has links)
<p>It is important to detect faults in bus details at an early stage. Because the driving style affects the breakdown of different details in the bus, identification of the driving style is important to minimize the number of failures in buses.</p><p>The identification of the driving style of the driver was based on the input data which contained examples of the driving runs of each class. K-nearest neighbor and neural networks algorithms were used. Different models were tested.</p><p>It was shown that the results depend on the selected driving runs. A hypothesis was suggested that the examples from different driving runs have different parameters which affect the results of the classification.</p><p>The best results were achieved by using a subset of variables chosen with help of the forward feature selection procedure. The percent of correct classifications is about 89-90 % for the k-nearest neighbor algorithm and 88-93 % for the neural networks.</p><p>Feature selection allowed a significant improvement in the results of the k-nearest neighbor algorithm and in the results of the neural networks algorithm received for the case when the training and testing data sets were selected from the different driving runs. On the other hand, feature selection did not affect the results received with the neural networks for the case when the training and testing data sets were selected from the same driving runs.</p><p>Another way to improve the results is to use smoothing. Computing the average class among a number of consequent examples allowed achieving a decrease in the error.</p>
|
5 |
Identification of Driving Styles in BusesKarginova, Nadezda January 2010 (has links)
It is important to detect faults in bus details at an early stage. Because the driving style affects the breakdown of different details in the bus, identification of the driving style is important to minimize the number of failures in buses. The identification of the driving style of the driver was based on the input data which contained examples of the driving runs of each class. K-nearest neighbor and neural networks algorithms were used. Different models were tested. It was shown that the results depend on the selected driving runs. A hypothesis was suggested that the examples from different driving runs have different parameters which affect the results of the classification. The best results were achieved by using a subset of variables chosen with help of the forward feature selection procedure. The percent of correct classifications is about 89-90 % for the k-nearest neighbor algorithm and 88-93 % for the neural networks. Feature selection allowed a significant improvement in the results of the k-nearest neighbor algorithm and in the results of the neural networks algorithm received for the case when the training and testing data sets were selected from the different driving runs. On the other hand, feature selection did not affect the results received with the neural networks for the case when the training and testing data sets were selected from the same driving runs. Another way to improve the results is to use smoothing. Computing the average class among a number of consequent examples allowed achieving a decrease in the error.
|
6 |
GENETIC ALGORITHMS FOR SAMPLE CLASSIFICATION OF MICROARRAY DATALiu, Dongqing 23 September 2005 (has links)
No description available.
|
7 |
Comparison of the Utility of Regression Analysis and K-Nearest Neighbor Technique to Estimate Above-Ground Biomass in Pine Forests Using Landsat ETM+ imageryPrabhu, Chitra L 13 May 2006 (has links)
There is a lack of precise and universally accepted approach in the quantification of carbon sequestered in aboveground woody biomass using remotely sensed data. Drafting of the Kyoto Protocol has made the subject of carbon sequestration more important, making the development of accurate and cost-effective remote sensing models a necessity. There has been much work done in estimating aboveground woody biomass from spectral data using the traditional multiple linear regression analysis approach and the Finnish k-nearest neighbor approach, but the accuracy of these methods to estimate biomass has not been compared. The purpose of this study is to compare the ability of these two methods in estimating above ground biomass (AGB) using spectral data derived from Landsat ETM+ imagery.
|
8 |
Método híbrido de detecção de intrusão aplicando inteligência artificial / Hybrid intrusion detection applying artificial inteligenceSouza, Cristiano Antonio de 09 February 2018 (has links)
Submitted by Miriam Lucas (miriam.lucas@unioeste.br) on 2018-04-06T14:31:39Z
No. of bitstreams: 2
Cristiano_Antonio_de_Souza_2018.pdf: 2020023 bytes, checksum: 1105b369d497031759e007333c20cad9 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-04-06T14:31:39Z (GMT). No. of bitstreams: 2
Cristiano_Antonio_de_Souza_2018.pdf: 2020023 bytes, checksum: 1105b369d497031759e007333c20cad9 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2018-02-09 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The last decades have been marked by rapid technological development, which was accelerated
by the creation of computer networks, and emphatically by the spread and growth of the Internet.
As a consequence of this context, private and confidential data of the most diverse areas
began to be treated and stored in distributed environments, making vital the security of this data.
Due to this fact, the number and variety of attacks on computer systems increased, mainly due
to the exploitation of vulnerabilities. Thence, the area of intrusion detection research has gained
notoriety, and hybrid detection methods using Artificial Intelligence techniques have been
achieving more satisfactory results than the use of such approaches individually. This work
consists of a Hybrid method of intrusion detection combining Artificial Neural Network (ANN)
and K-Nearest Neighbors KNN techniques. The evaluation of the proposed Hybrid method and
the comparison with ANN and KNN techniques individually were developed according to the
steps of the Knowledge Discovery in Databases process. For the realization of the experiments,
the NSL-KDD public database was selected and, with the attribute selection task, five sub-bases
were derived. The experimental results showed that the Hybrid method had better accuracy in
relation to ANN in all configurations, whereas in relation to KNN, it reached equivalent accuracy
and showed a significant reduction in processing time. Finally, it should be emphasized
that among the hybrid configurations evaluated quantitatively and statistically, the best performances
in terms of accuracy and classification time were obtained by the hybrid approaches
HIB(P25-N75)-C, HIB(P25-N75)-30 and HIB(P25-N75)-20. / As últimas décadas têm sido marcadas pelo rápido desenvolvimento tecnológico, o qual
foi acelerado pela criação das redes de computadores, e enfaticamente pela disseminação e crescimento
da Internet. Como consequência deste contexto, dados privados e sigilosos das mais
diversas áreas passaram a ser tratados e armazenados em ambientes distribuídos, tornando-se
vital a segurança dos mesmos. Decorrente ao fato, observa-se um crescimento na quantidade
e variedade de ataques a sistemas computacionais, principalmente pela exploração de vulnerabilidades.
Em função desse contexto, a área de pesquisa em detecção de intrusão tem ganhado
notoriedade, e os métodos híbridos de detecção utilizando técnicas de Inteligência Artificial
vêm alcançando resultados mais satisfatórios do que a utilização de tais abordagens de modo
individual. Este trabalho consiste em um método Híbrido de detecção de intrusão combinando
as técnicas Redes Neurais Artificiais (RNA) e K-Nearest Neighbors (KNN). A avaliação do
método Híbrido proposto e a comparação com as técnicas de RNA e KNN isoladamente foram
desenvolvidas de acordo com as etapas do processo de Knowledge Discovery in Databases
(KDD) . Para a realização dos experimentos selecionou-se a base de dados pública NSL-KDD,
sendo que com o processo de seleção de atributos derivou-se cinco sub-bases. Os resultados
experimentais comprovaram que o método Híbrido teve melhor acurácia em relação a RNA
em todas as configurações, ao passo que em relação ao KNN, alcançou acurácia equivalente e
apresentou relevante redução no tempo de processamento. Por fim, cabe ressaltar que dentre as
configurações híbridas avaliadas quantitativa e estatisticamente, os melhores desempenhos em
termos de acurácia e tempo de classificação foram obtidos pelas abordagens híbridas HIB(P25-
N75)-C, HIB(P25-N75)-30 e HIB(P25-N75)-20.
|
9 |
Datautvinning av klickdata : Kombination av klustring och klassifikation / Data mining of click data : Combination of clustering and classificationZhang, Xianjie, Bogic, Sebastian January 2018 (has links)
Ägare av webbplatser och applikationer tjänar ofta på att användare klickar på deras länkar. Länkarna kan bland annat vara reklam eller varor som säljs. Det finns många studier inom dataanalys angående om en sådan länk kommer att bli klickad, men få studier fokuserar på hur länkarna kan justeras för att bli klickade. Problemet som företaget Flygresor.se har är att de saknar ett verktyg för deras kunder, resebyråer, att analysera deras biljetter och därefter justera attributen för resorna. Den efterfrågade lösningen var en applikation som gav förslag på hur biljetterna skulle förändras för att bli mer klickade och på såsätt kunna sälja fler resor. I detta arbete byggdes en prototyp som använder sig av två olika datautvinningsmetoder, klustring med algoritmen DBSCAN och klassifikation med algoritmen k-NN. Algoritmerna användes tillsammans med en utvärderingsprocess, kallad DNNA, som analyserade resultatet från dessa två algoritmer och gav förslag på förändringar av artikelns attribut. Kombinationen av algoritmerna tillsammans med DNNA testades och utvärderades som lösning till problemet. Programmet lyckades förutse vilka attribut av biljetter som behövde justeras för att biljetterna skulle bli mer klickade. Rekommendationerna av justeringar var rimliga men eftersom andra liknande verktyg inte hade publicerats kunde detta arbetes resultat inte jämföras. / Owners of websites and applications usually profits through users that clicks on their links. These can be advertisements or items for sale amongst others. There are many studies about data analysis where they tell you if a link will be clicked, but only a few that focus on what needs to be adjusted to get the link clicked. The problem that Flygresor.se have is that they are missing a tool for their customers, travel agencies, that analyses their tickets and after that adjusts the attributes of those trips. The requested solution was an application which gave suggestions about how to change the tickets in a way that would make it more clicked and in that way, make more sales. A prototype was constructed which make use of two different data mining methods, clustering with the algorithm DBSCAN and classification with the algorithm knearest neighbor. These algorithms were used together with an evaluation process, called DNNA, which analyzes the result from the algorithms and gave suggestions about changes that could be done to the attributes of the links. The combination of the algorithms and DNNA was tested and evaluated as the solution to the problem. The program was able to predict what attributes of the tickets needed to be adjusted to get the tickets more clicks. ‘The recommendations of adjustments were reasonable but this result could not be compared to similar tools since they had not been published.
|
10 |
Brain Tumor Target Volume Determination for Radiation Therapy Treatment Planning Through the Use of Automated MRI SegmentationMazzara, Gloria Patrika 27 February 2004 (has links)
Radiation therapy seeks to effectively irradiate the tumor cells while minimizing the dose to adjacent normal cells. Prior research found that the low success rates for treating brain tumors would be improved with higher radiation doses to the tumor area. This is feasible only if the target volume can be precisely identified. However, the definition of tumor volume is still based on time-intensive, highly subjective manual outlining by radiation oncologists. In this study the effectiveness of two automated Magnetic Resonance Imaging (MRI) segmentation methods, k-Nearest Neighbors (kNN) and Knowledge-Guided (KG), in determining the Gross Tumor Volume (GTV) of brain tumors for use in radiation therapy was assessed. Three criteria were applied: accuracy of the contours; quality of the resulting treatment plan in terms of dose to the tumor; and a novel treatment plan evaluation technique based on post-treatment images.
The kNN method was able to segment all cases while the KG method was limited to enhancing tumors and gliomas with clear enhancing edges. Various software applications were developed to create a closed smooth contour that encompassed the tumor pixels from the segmentations and to integrate these results into the treatment planning software. A novel, probabilistic measurement of accuracy was introduced to compare the agreement of the segmentation methods with the weighted average physician volume. Both computer methods under-segment the tumor volume when compared with the physicians but performed within the variability of manual contouring (28% plus/minus12% for inter-operator variability).
Computer segmentations were modified vertically to compensate for their under-segmentation. When comparing radiation treatment plans designed from physician-defined tumor volumes with treatment plans developed from the modified segmentation results, the reference target volume was irradiated within the same level of conformity. Analysis of the plans based on post- treatment MRI showed that the segmentation plans provided similar dose coverage to areas being treated by the original treatment plans.
This research demonstrates that computer segmentations provide a feasible route to automatic target volume definition. Because of the lower variability and greater efficiency of the automated techniques, their use could lead to more precise plans and better prognosis for brain tumor patients.
|
Page generated in 0.058 seconds