Spelling suggestions: "subject:"nearest neighbor"" "subject:"nearest weighbor""
31 |
Método híbrido de detecção de intrusão aplicando inteligência artificial / Hybrid intrusion detection applying artificial inteligenceSouza, Cristiano Antonio de 09 February 2018 (has links)
Submitted by Miriam Lucas (miriam.lucas@unioeste.br) on 2018-04-06T14:31:39Z
No. of bitstreams: 2
Cristiano_Antonio_de_Souza_2018.pdf: 2020023 bytes, checksum: 1105b369d497031759e007333c20cad9 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-04-06T14:31:39Z (GMT). No. of bitstreams: 2
Cristiano_Antonio_de_Souza_2018.pdf: 2020023 bytes, checksum: 1105b369d497031759e007333c20cad9 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2018-02-09 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The last decades have been marked by rapid technological development, which was accelerated
by the creation of computer networks, and emphatically by the spread and growth of the Internet.
As a consequence of this context, private and confidential data of the most diverse areas
began to be treated and stored in distributed environments, making vital the security of this data.
Due to this fact, the number and variety of attacks on computer systems increased, mainly due
to the exploitation of vulnerabilities. Thence, the area of intrusion detection research has gained
notoriety, and hybrid detection methods using Artificial Intelligence techniques have been
achieving more satisfactory results than the use of such approaches individually. This work
consists of a Hybrid method of intrusion detection combining Artificial Neural Network (ANN)
and K-Nearest Neighbors KNN techniques. The evaluation of the proposed Hybrid method and
the comparison with ANN and KNN techniques individually were developed according to the
steps of the Knowledge Discovery in Databases process. For the realization of the experiments,
the NSL-KDD public database was selected and, with the attribute selection task, five sub-bases
were derived. The experimental results showed that the Hybrid method had better accuracy in
relation to ANN in all configurations, whereas in relation to KNN, it reached equivalent accuracy
and showed a significant reduction in processing time. Finally, it should be emphasized
that among the hybrid configurations evaluated quantitatively and statistically, the best performances
in terms of accuracy and classification time were obtained by the hybrid approaches
HIB(P25-N75)-C, HIB(P25-N75)-30 and HIB(P25-N75)-20. / As últimas décadas têm sido marcadas pelo rápido desenvolvimento tecnológico, o qual
foi acelerado pela criação das redes de computadores, e enfaticamente pela disseminação e crescimento
da Internet. Como consequência deste contexto, dados privados e sigilosos das mais
diversas áreas passaram a ser tratados e armazenados em ambientes distribuídos, tornando-se
vital a segurança dos mesmos. Decorrente ao fato, observa-se um crescimento na quantidade
e variedade de ataques a sistemas computacionais, principalmente pela exploração de vulnerabilidades.
Em função desse contexto, a área de pesquisa em detecção de intrusão tem ganhado
notoriedade, e os métodos híbridos de detecção utilizando técnicas de Inteligência Artificial
vêm alcançando resultados mais satisfatórios do que a utilização de tais abordagens de modo
individual. Este trabalho consiste em um método Híbrido de detecção de intrusão combinando
as técnicas Redes Neurais Artificiais (RNA) e K-Nearest Neighbors (KNN). A avaliação do
método Híbrido proposto e a comparação com as técnicas de RNA e KNN isoladamente foram
desenvolvidas de acordo com as etapas do processo de Knowledge Discovery in Databases
(KDD) . Para a realização dos experimentos selecionou-se a base de dados pública NSL-KDD,
sendo que com o processo de seleção de atributos derivou-se cinco sub-bases. Os resultados
experimentais comprovaram que o método Híbrido teve melhor acurácia em relação a RNA
em todas as configurações, ao passo que em relação ao KNN, alcançou acurácia equivalente e
apresentou relevante redução no tempo de processamento. Por fim, cabe ressaltar que dentre as
configurações híbridas avaliadas quantitativa e estatisticamente, os melhores desempenhos em
termos de acurácia e tempo de classificação foram obtidos pelas abordagens híbridas HIB(P25-
N75)-C, HIB(P25-N75)-30 e HIB(P25-N75)-20.
|
32 |
Fast Algorithms for Nearest Neighbour SearchKibriya, Ashraf Masood January 2007 (has links)
The nearest neighbour problem is of practical significance in a number of fields. Often we are interested in finding an object near to a given query object. The problem is old, and a large number of solutions have been proposed for it in the literature. However, it remains the case that even the most popular of the techniques proposed for its solution have not been compared against each other. Also, many techniques, including the old and popular ones, can be implemented in a number of ways, and often the different implementations of a technique have not been thoroughly compared either. This research presents a detailed investigation of different implementations of two popular nearest neighbour search data structures, KDTrees and Metric Trees, and compares the different implementations of each of the two structures against each other. The best implementations of these structures are then compared against each other and against two other techniques, Annulus Method and Cover Trees. Annulus Method is an old technique that was rediscovered during the research for this thesis. Cover Trees are one of the most novel and promising data structures for nearest neighbour search that have been proposed in the literature.
|
33 |
Classification using residual vector quantizationAli Khan, Syed Irteza 13 January 2014 (has links)
Residual vector quantization (RVQ) is a 1-nearest neighbor (1-NN) type of technique. RVQ is a multi-stage implementation of regular vector quantization. An input is successively quantized to the nearest codevector in each stage codebook. In classification, nearest neighbor techniques are very attractive since these techniques very accurately model the ideal Bayes class boundaries. However, nearest neighbor classification techniques require a large size of representative dataset. Since in such techniques a test input is assigned a class membership after an exhaustive search the entire training set, a reasonably large training set can make the implementation cost of the nearest neighbor classifier unfeasibly costly. Although, the k-d tree structure offers a far more efficient implementation of 1-NN search, however, the cost of storing the data points can become prohibitive, especially in higher dimensionality.
RVQ also offers a nice solution to a cost-effective implementation of 1-NN-based classification. Because of the direct-sum structure of the RVQ codebook, the memory and computational of cost 1-NN-based system is greatly reduced. Although, as compared to an equivalent 1-NN system, the multi-stage implementation of the RVQ codebook compromises the accuracy of the class boundaries, yet the classification error has been empirically shown to be within 3% to 4% of the performance of an equivalent 1-NN-based classifier.
|
34 |
Pattern Synthesis Techniques And Compact Data Representation Schemes For Efficient Nearest Neighbor ClassificationPulabaigari, Viswanath 01 1900 (has links) (PDF)
No description available.
|
35 |
Datautvinning av klickdata : Kombination av klustring och klassifikation / Data mining of click data : Combination of clustering and classificationZhang, Xianjie, Bogic, Sebastian January 2018 (has links)
Ägare av webbplatser och applikationer tjänar ofta på att användare klickar på deras länkar. Länkarna kan bland annat vara reklam eller varor som säljs. Det finns många studier inom dataanalys angående om en sådan länk kommer att bli klickad, men få studier fokuserar på hur länkarna kan justeras för att bli klickade. Problemet som företaget Flygresor.se har är att de saknar ett verktyg för deras kunder, resebyråer, att analysera deras biljetter och därefter justera attributen för resorna. Den efterfrågade lösningen var en applikation som gav förslag på hur biljetterna skulle förändras för att bli mer klickade och på såsätt kunna sälja fler resor. I detta arbete byggdes en prototyp som använder sig av två olika datautvinningsmetoder, klustring med algoritmen DBSCAN och klassifikation med algoritmen k-NN. Algoritmerna användes tillsammans med en utvärderingsprocess, kallad DNNA, som analyserade resultatet från dessa två algoritmer och gav förslag på förändringar av artikelns attribut. Kombinationen av algoritmerna tillsammans med DNNA testades och utvärderades som lösning till problemet. Programmet lyckades förutse vilka attribut av biljetter som behövde justeras för att biljetterna skulle bli mer klickade. Rekommendationerna av justeringar var rimliga men eftersom andra liknande verktyg inte hade publicerats kunde detta arbetes resultat inte jämföras. / Owners of websites and applications usually profits through users that clicks on their links. These can be advertisements or items for sale amongst others. There are many studies about data analysis where they tell you if a link will be clicked, but only a few that focus on what needs to be adjusted to get the link clicked. The problem that Flygresor.se have is that they are missing a tool for their customers, travel agencies, that analyses their tickets and after that adjusts the attributes of those trips. The requested solution was an application which gave suggestions about how to change the tickets in a way that would make it more clicked and in that way, make more sales. A prototype was constructed which make use of two different data mining methods, clustering with the algorithm DBSCAN and classification with the algorithm knearest neighbor. These algorithms were used together with an evaluation process, called DNNA, which analyzes the result from the algorithms and gave suggestions about changes that could be done to the attributes of the links. The combination of the algorithms and DNNA was tested and evaluated as the solution to the problem. The program was able to predict what attributes of the tickets needed to be adjusted to get the tickets more clicks. ‘The recommendations of adjustments were reasonable but this result could not be compared to similar tools since they had not been published.
|
36 |
Exploring Techniques for Providing Privacy in Location-Based Services Nearest Neighbor QueryAsanya, John-Charles 01 January 2015 (has links)
Increasing numbers of people are subscribing to location-based services, but as the popularity grows so are the privacy concerns. Varieties of research exist to address these privacy concerns. Each technique tries to address different models with which location-based services respond to subscribers. In this work, we present ideas to address privacy concerns for the two main models namely: the snapshot nearest neighbor query model and the continuous nearest neighbor query model. First, we address snapshot nearest neighbor query model where location-based services response represents a snapshot of point in time. In this model, we introduce a novel idea based on the concept of an open set in a topological space where points belongs to a subset called neighborhood of a point. We extend this concept to provide anonymity to real objects where each object belongs to a disjointed neighborhood such that each neighborhood contains a single object. To help identify the objects, we implement a database which dynamically scales in direct proportion with the size of the neighborhood. To retrieve information secretly and allow the database to expose only requested information, private information retrieval protocols are executed twice on the data. Our study of the implementation shows that the concept of a single object neighborhood is able to efficiently scale the database with the objects in the area. The size of the database grows with the size of the grid and the objects covered by the location-based services. Typically, creating neighborhoods, computing distances between objects in the area, and running private information retrieval protocols causes the CPU to respond slowly with this increase in database size. In order to handle a large number of objects, we explore the concept of kernel and parallel computing in GPU. We develop GPU parallel implementation of the snapshot query to handle large number of objects. In our experiment, we exploit parameter tuning. The results show that with parameter tuning and parallel computing power of GPU we are able to significantly reduce the response time as the number of objects increases. To determine response time of an application without knowledge of the intricacies of GPU architecture, we extend our analysis to predict GPU execution time. We develop the run time equation for an operation and extrapolate the run time for a problem set based on the equation, and then we provide a model to predict GPU response time. As an alternative, the snapshot nearest neighbor query privacy problem can be addressed using secure hardware computing which can eliminate the need for protecting the rest of the sub-system, minimize resource usage and network transmission time. In this approach, a secure coprocessor is used to provide privacy. We process all information inside the coprocessor to deny adversaries access to any private information. To obfuscate access pattern to external memory location, we use oblivious random access memory methodology to access the server. Experimental evaluation shows that using a secure coprocessor reduces resource usage and query response time as the size of the coverage area and objects increases. Second, we address privacy concerns in the continuous nearest neighbor query model where location-based services automatically respond to a change in object*s location. In this model, we present solutions for two different types known as moving query static object and moving query moving object. For the solutions, we propose plane partition using a Voronoi diagram, and a continuous fractal space filling curve using a Hilbert curve order to create a continuous nearest neighbor relationship between the points of interest in a path. Specifically, space filling curve results in multi-dimensional to 1-dimensional object mapping where values are assigned to the objects based on proximity. To prevent subscribers from issuing a query each time there is a change in location and to reduce the response time, we introduce the concept of transition and update time to indicate where and when the nearest neighbor changes. We also introduce a database that dynamically scales with the size of the objects in a path to help obscure and relate objects. By executing the private information retrieval protocol twice on the data, the user secretly retrieves requested information from the database. The results of our experiment show that using plane partitioning and a fractal space filling curve to create nearest neighbor relationships with transition time between objects reduces the total response time.
|
37 |
Microeconometric Models with Endogeneity -- Theoretical and Empirical StudiesDong, Yingying January 2009 (has links)
Thesis advisor: Arthur Lewbel / This dissertation consists of three independent essays in applied microeconomics and econometrics. Essay 1 investigates the issue why individuals with health insurance use more health care. One obvious reason is that health care is cheaper for the insured. But additionally, having insurance can encourage unhealthy behavior via moral hazard. The effect of health insurance on medical utilization has been extensively studied; however, previous work has mostly ignored the effect of insurance on behavior and how that in turn affects medical utilization. This essay examines these distinct effects. The increased medical utilization due to reduced prices may help the insured maintain good health, while that due to increased unhealthy behavior does not, so distinguishing these two effects has important policy implications. A two-period dynamic forward-looking model is constructed to derive the structural causal relationships among the decision to buy insurance, health behaviors (drinking, smoking, and exercise), and medical utilization. The model shows how exogenous changes in insurance prices and past behaviors can identify the direct and indirect effects of insurance on medical utilization. An empirical analysis also distinguishes between intensive and extensive margins (e.g., changes in the number of drinkers vs. the amount of alcohol consumed) of the insurance effect, which turns out to be empirically important. Health insurance is found to encourage less healthy behavior, particularly heavy drinking, but this does not yield a short term perceptible increase in doctor or hospital visits. The effects of health insurance are primarily found at the intensive margin, e.g., health insurance may not cause a non-drinker to take up drinking, while it encourages a heavy drinker to drink even more. These results suggest that to counteract behavioral moral hazard, health insurance should be coupled with incentives that target individuals who currently engage in unhealthy behaviors, such as heavy drinkers. Essay 2 examines the effect of repeating kindergarten on the retained children's academic performance. Although most existing research concludes that grade retention generates no benefits for retainees' later academic performance, holding low achieving children back has been a popular practice for decades. Drawing on a recently collected nationally representative data set in the US, this paper estimates the causal effect of kindergarten retention on the retained children's later academic performance. Since children are observed being held back only when they enroll in schools that permit retention, this paper jointly models 1) the decision of entering a school allowing for kindergarten retention, 2) the decision of undergoing a retention treatment in kindergarten, and 3) children's academic performance in higher grades. The retention treatment is modeled as a binary choice with sample selection. The outcome equations are linear regressions including the kindergarten retention dummy as an endogenous regressor with a correlated random coefficient. A control function estimator is developed for estimating the resulting double-hurdle treatment model, which allows for unobserved heterogeneity in the retention effect. As a comparison, a nonparametric bias-corrected nearest neighbor matching estimator is also implemented. Holding children back in kindergarten is found to have positive but diminishing effects on their academic performance up to the third grade. Essay 3 proves the semiparametric identification of a binary choice model having an endogenous regressor without relying on outside instruments. A simple estimator and a test for endogeneity are provided based on this identification. These results are applied to analyze working age male's migration within the US, where labor income is potentially endogenous. Identification relies on the fact that the migration probability among workers is close to linear in age while labor income is nonlinear in age(when both are nonparametrically estimated). Using data from the PSID, this study finds that labor income is endogenous and that ignoring this endogeneity leads to downward bias in the estimated effect of labor income on the migration probability. / Thesis (PhD) — Boston College, 2009. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics.
|
38 |
Learning deep embeddings by learning to rankHe, Kun 05 February 2019 (has links)
We study the problem of embedding high-dimensional visual data into low-dimensional vector representations. This is an important component in many computer vision applications involving nearest neighbor retrieval, as embedding techniques not only perform dimensionality reduction, but can also capture task-specific semantic similarities. In this thesis, we use deep neural networks to learn vector embeddings, and develop a gradient-based optimization framework that is capable of optimizing ranking-based retrieval performance metrics, such as the widely used Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). Our framework is applied in three applications.
First, we study Supervised Hashing, which is concerned with learning compact binary vector embeddings for fast retrieval, and propose two novel solutions. The first solution optimizes Mutual Information as a surrogate ranking objective, while the other directly optimizes AP and NDCG, based on the discovery of their closed-form expressions for discrete Hamming distances. These optimization problems are NP-hard, therefore we derive their continuous relaxations to enable gradient-based optimization with neural networks. Our solutions establish the state-of-the-art on several image retrieval benchmarks.
Next, we learn deep neural networks to extract Local Feature Descriptors from image patches. Local features are used universally in low-level computer vision tasks that involve sparse feature matching, such as image registration and 3D reconstruction, and their matching is a nearest neighbor retrieval problem. We leverage our AP optimization technique to learn both binary and real-valued descriptors for local image patches. Compared to competing approaches, our solution eliminates complex heuristics, and performs more accurately in the tasks of patch verification, patch retrieval, and image matching.
Lastly, we tackle Deep Metric Learning, the general problem of learning real-valued vector embeddings using deep neural networks. We propose a learning to rank solution through optimizing a novel quantization-based approximation of AP. For downstream tasks such as retrieval and clustering, we demonstrate promising results on standard benchmarks, especially in the few-shot learning scenario, where the number of labeled examples per class is limited.
|
39 |
Automatic text categorization for information filtering.January 1998 (has links)
Ho Chao Yang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 157-163). / Abstract also in Chinese. / Abstract --- p.i / Acknowledgment --- p.iii / List of Figures --- p.viii / List of Tables --- p.xiv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic Document Categorization --- p.1 / Chapter 1.2 --- Information Filtering --- p.3 / Chapter 1.3 --- Contributions --- p.6 / Chapter 1.4 --- Organization of the Thesis --- p.7 / Chapter 2 --- Related Work --- p.9 / Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9 / Chapter 2.1.1 --- Rule-Based Approach --- p.10 / Chapter 2.1.2 --- Similarity-Based Approach --- p.13 / Chapter 2.2 --- Existing Information Filtering Approaches --- p.19 / Chapter 2.2.1 --- Information Filtering Systems --- p.19 / Chapter 2.2.2 --- Filtering in TREC --- p.21 / Chapter 3 --- Document Pre-Processing --- p.23 / Chapter 3.1 --- Document Representation --- p.23 / Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26 / Chapter 4 --- A New Approach - IBRI --- p.31 / Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31 / Chapter 4.2 --- The IBRI Representation and Definitions --- p.34 / Chapter 4.3 --- The IBRI Learning Algorithm --- p.37 / Chapter 5 --- IBRI Experiments --- p.43 / Chapter 5.1 --- Experimental Setup --- p.43 / Chapter 5.2 --- Evaluation Metric --- p.45 / Chapter 5.3 --- Results --- p.46 / Chapter 6 --- A New Approach - GIS --- p.50 / Chapter 6.1 --- Motivation of GIS --- p.50 / Chapter 6.2 --- Similarity-Based Learning --- p.51 / Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58 / Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63 / Chapter 6.5 --- Time Complexity --- p.64 / Chapter 7 --- GIS Experiments --- p.68 / Chapter 7.1 --- Experimental Setup --- p.68 / Chapter 7.2 --- Results --- p.73 / Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87 / Chapter 8.1 --- Information Filtering Systems --- p.87 / Chapter 8.2 --- GIS-Based Information Filtering --- p.90 / Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95 / Chapter 9.1 --- Experimental Setup --- p.95 / Chapter 9.2 --- Results --- p.100 / Chapter 10 --- Conclusions and Future Work --- p.108 / Chapter 10.1 --- Conclusions --- p.108 / Chapter 10.2 --- Future Work --- p.110 / Chapter A --- Sample Documents in the corpora --- p.111 / Chapter B --- Details of Experimental Results of GIS --- p.120 / Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141
|
40 |
Superseding neighbor search on uncertain data. / 在不確定的空間數據庫中尋找最高取代性的最近鄰 / Zai bu que ding de kong jian shu ju ku zhong xun zhao zui gao qu dai xing de zui jin linJanuary 2009 (has links)
Yuen, Sze Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves [44]-46). / Abstract also in Chinese. / Thesis Committee --- p.i / Abstract --- p.ii / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Related Work --- p.6 / Chapter 2.1 --- Nearest Neighbor Search on Precise Data --- p.6 / Chapter 2.2 --- NN Search on Uncertain Data --- p.8 / Chapter 3 --- Problem Definitions and Basic Characteristics --- p.11 / Chapter 4 --- The Full-Graph Approach --- p.16 / Chapter 5 --- The Pipeline Approach --- p.19 / Chapter 5.1 --- The Algorithm --- p.20 / Chapter 5.2 --- Edge Phase --- p.24 / Chapter 5.3 --- Pruning Phase --- p.27 / Chapter 5.4 --- Validating Phase --- p.28 / Chapter 5.5 --- Discussion --- p.29 / Chapter 6 --- Extension --- p.31 / Chapter 7 --- Experiment --- p.34 / Chapter 7.1 --- Properties of the SNN-core --- p.34 / Chapter 7.2 --- Efficiency of Our Algorithms --- p.38 / Chapter 8 --- Conclusions and Future Work --- p.42 / Chapter A --- List of Publications --- p.43 / Bibliography --- p.44
|
Page generated in 0.0485 seconds