51 |
Método híbrido de detecção de intrusão aplicando inteligência artificial / Hybrid intrusion detection applying artificial inteligenceSouza, Cristiano Antonio de 09 February 2018 (has links)
Submitted by Miriam Lucas (miriam.lucas@unioeste.br) on 2018-04-06T14:31:39Z
No. of bitstreams: 2
Cristiano_Antonio_de_Souza_2018.pdf: 2020023 bytes, checksum: 1105b369d497031759e007333c20cad9 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-04-06T14:31:39Z (GMT). No. of bitstreams: 2
Cristiano_Antonio_de_Souza_2018.pdf: 2020023 bytes, checksum: 1105b369d497031759e007333c20cad9 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2018-02-09 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The last decades have been marked by rapid technological development, which was accelerated
by the creation of computer networks, and emphatically by the spread and growth of the Internet.
As a consequence of this context, private and confidential data of the most diverse areas
began to be treated and stored in distributed environments, making vital the security of this data.
Due to this fact, the number and variety of attacks on computer systems increased, mainly due
to the exploitation of vulnerabilities. Thence, the area of intrusion detection research has gained
notoriety, and hybrid detection methods using Artificial Intelligence techniques have been
achieving more satisfactory results than the use of such approaches individually. This work
consists of a Hybrid method of intrusion detection combining Artificial Neural Network (ANN)
and K-Nearest Neighbors KNN techniques. The evaluation of the proposed Hybrid method and
the comparison with ANN and KNN techniques individually were developed according to the
steps of the Knowledge Discovery in Databases process. For the realization of the experiments,
the NSL-KDD public database was selected and, with the attribute selection task, five sub-bases
were derived. The experimental results showed that the Hybrid method had better accuracy in
relation to ANN in all configurations, whereas in relation to KNN, it reached equivalent accuracy
and showed a significant reduction in processing time. Finally, it should be emphasized
that among the hybrid configurations evaluated quantitatively and statistically, the best performances
in terms of accuracy and classification time were obtained by the hybrid approaches
HIB(P25-N75)-C, HIB(P25-N75)-30 and HIB(P25-N75)-20. / As últimas décadas têm sido marcadas pelo rápido desenvolvimento tecnológico, o qual
foi acelerado pela criação das redes de computadores, e enfaticamente pela disseminação e crescimento
da Internet. Como consequência deste contexto, dados privados e sigilosos das mais
diversas áreas passaram a ser tratados e armazenados em ambientes distribuídos, tornando-se
vital a segurança dos mesmos. Decorrente ao fato, observa-se um crescimento na quantidade
e variedade de ataques a sistemas computacionais, principalmente pela exploração de vulnerabilidades.
Em função desse contexto, a área de pesquisa em detecção de intrusão tem ganhado
notoriedade, e os métodos híbridos de detecção utilizando técnicas de Inteligência Artificial
vêm alcançando resultados mais satisfatórios do que a utilização de tais abordagens de modo
individual. Este trabalho consiste em um método Híbrido de detecção de intrusão combinando
as técnicas Redes Neurais Artificiais (RNA) e K-Nearest Neighbors (KNN). A avaliação do
método Híbrido proposto e a comparação com as técnicas de RNA e KNN isoladamente foram
desenvolvidas de acordo com as etapas do processo de Knowledge Discovery in Databases
(KDD) . Para a realização dos experimentos selecionou-se a base de dados pública NSL-KDD,
sendo que com o processo de seleção de atributos derivou-se cinco sub-bases. Os resultados
experimentais comprovaram que o método Híbrido teve melhor acurácia em relação a RNA
em todas as configurações, ao passo que em relação ao KNN, alcançou acurácia equivalente e
apresentou relevante redução no tempo de processamento. Por fim, cabe ressaltar que dentre as
configurações híbridas avaliadas quantitativa e estatisticamente, os melhores desempenhos em
termos de acurácia e tempo de classificação foram obtidos pelas abordagens híbridas HIB(P25-
N75)-C, HIB(P25-N75)-30 e HIB(P25-N75)-20.
|
52 |
Classification using residual vector quantizationAli Khan, Syed Irteza 13 January 2014 (has links)
Residual vector quantization (RVQ) is a 1-nearest neighbor (1-NN) type of technique. RVQ is a multi-stage implementation of regular vector quantization. An input is successively quantized to the nearest codevector in each stage codebook. In classification, nearest neighbor techniques are very attractive since these techniques very accurately model the ideal Bayes class boundaries. However, nearest neighbor classification techniques require a large size of representative dataset. Since in such techniques a test input is assigned a class membership after an exhaustive search the entire training set, a reasonably large training set can make the implementation cost of the nearest neighbor classifier unfeasibly costly. Although, the k-d tree structure offers a far more efficient implementation of 1-NN search, however, the cost of storing the data points can become prohibitive, especially in higher dimensionality.
RVQ also offers a nice solution to a cost-effective implementation of 1-NN-based classification. Because of the direct-sum structure of the RVQ codebook, the memory and computational of cost 1-NN-based system is greatly reduced. Although, as compared to an equivalent 1-NN system, the multi-stage implementation of the RVQ codebook compromises the accuracy of the class boundaries, yet the classification error has been empirically shown to be within 3% to 4% of the performance of an equivalent 1-NN-based classifier.
|
53 |
Simple, Faster Kinetic Data StructuresRahmati, Zahed 28 August 2014 (has links)
Proximity problems and point set embeddability problems are fundamental and well-studied in computational geometry and graph drawing. Examples of such problems that are of particular interest to us in this dissertation include: finding the closest pair among a set P of points, finding the k-nearest neighbors to each point p in P, answering reverse k-nearest neighbor queries, computing the Yao graph, the Semi-Yao graph and the Euclidean minimum spanning tree of P, and mapping the vertices of a planar graph to a set P of points without inducing edge crossings.
In this dissertation, we consider so-called kinetic version of these problems, that is, the points are allowed to move continuously along known trajectories, which are subject to change. We design a set of data structures and a mechanism to efficiently update the data structures. These updates occur at critical, discrete times. Also, a query may arrive at any time. We want to answer queries quickly without solving problems from scratch, so we maintain solutions continuously. We present new techniques for giving kinetic solutions with better performance for some these problems, and we provide the first kinetic results for others. In particular, we provide:
• A simple kinetic data structure (KDS) to maintain all the nearest neighbors and the closest pair. Our deterministic kinetic approach for maintenance of all the nearest neighbors improves the previous randomized kinetic algorithm.
• An exact KDS for maintenance of the Euclidean minimum spanning tree, which improves the previous KDS.
• The first KDS's for maintenance of the Yao graph and the Semi-Yao graph.
• The first KDS to consider maintaining plane graphs on moving points.
• The first KDS for maintenance of all the k-nearest neighbors, for any k ≥ 1.
• The first KDS to answer the reverse k-nearest neighbor queries, for any k ≥ 1 in any fixed dimension, on a set of moving points. / Graduate
|
54 |
Pattern Synthesis Techniques And Compact Data Representation Schemes For Efficient Nearest Neighbor ClassificationPulabaigari, Viswanath 01 1900 (has links) (PDF)
No description available.
|
55 |
Datautvinning av klickdata : Kombination av klustring och klassifikation / Data mining of click data : Combination of clustering and classificationZhang, Xianjie, Bogic, Sebastian January 2018 (has links)
Ägare av webbplatser och applikationer tjänar ofta på att användare klickar på deras länkar. Länkarna kan bland annat vara reklam eller varor som säljs. Det finns många studier inom dataanalys angående om en sådan länk kommer att bli klickad, men få studier fokuserar på hur länkarna kan justeras för att bli klickade. Problemet som företaget Flygresor.se har är att de saknar ett verktyg för deras kunder, resebyråer, att analysera deras biljetter och därefter justera attributen för resorna. Den efterfrågade lösningen var en applikation som gav förslag på hur biljetterna skulle förändras för att bli mer klickade och på såsätt kunna sälja fler resor. I detta arbete byggdes en prototyp som använder sig av två olika datautvinningsmetoder, klustring med algoritmen DBSCAN och klassifikation med algoritmen k-NN. Algoritmerna användes tillsammans med en utvärderingsprocess, kallad DNNA, som analyserade resultatet från dessa två algoritmer och gav förslag på förändringar av artikelns attribut. Kombinationen av algoritmerna tillsammans med DNNA testades och utvärderades som lösning till problemet. Programmet lyckades förutse vilka attribut av biljetter som behövde justeras för att biljetterna skulle bli mer klickade. Rekommendationerna av justeringar var rimliga men eftersom andra liknande verktyg inte hade publicerats kunde detta arbetes resultat inte jämföras. / Owners of websites and applications usually profits through users that clicks on their links. These can be advertisements or items for sale amongst others. There are many studies about data analysis where they tell you if a link will be clicked, but only a few that focus on what needs to be adjusted to get the link clicked. The problem that Flygresor.se have is that they are missing a tool for their customers, travel agencies, that analyses their tickets and after that adjusts the attributes of those trips. The requested solution was an application which gave suggestions about how to change the tickets in a way that would make it more clicked and in that way, make more sales. A prototype was constructed which make use of two different data mining methods, clustering with the algorithm DBSCAN and classification with the algorithm knearest neighbor. These algorithms were used together with an evaluation process, called DNNA, which analyzes the result from the algorithms and gave suggestions about changes that could be done to the attributes of the links. The combination of the algorithms and DNNA was tested and evaluated as the solution to the problem. The program was able to predict what attributes of the tickets needed to be adjusted to get the tickets more clicks. ‘The recommendations of adjustments were reasonable but this result could not be compared to similar tools since they had not been published.
|
56 |
Exploring Techniques for Providing Privacy in Location-Based Services Nearest Neighbor QueryAsanya, John-Charles 01 January 2015 (has links)
Increasing numbers of people are subscribing to location-based services, but as the popularity grows so are the privacy concerns. Varieties of research exist to address these privacy concerns. Each technique tries to address different models with which location-based services respond to subscribers. In this work, we present ideas to address privacy concerns for the two main models namely: the snapshot nearest neighbor query model and the continuous nearest neighbor query model. First, we address snapshot nearest neighbor query model where location-based services response represents a snapshot of point in time. In this model, we introduce a novel idea based on the concept of an open set in a topological space where points belongs to a subset called neighborhood of a point. We extend this concept to provide anonymity to real objects where each object belongs to a disjointed neighborhood such that each neighborhood contains a single object. To help identify the objects, we implement a database which dynamically scales in direct proportion with the size of the neighborhood. To retrieve information secretly and allow the database to expose only requested information, private information retrieval protocols are executed twice on the data. Our study of the implementation shows that the concept of a single object neighborhood is able to efficiently scale the database with the objects in the area. The size of the database grows with the size of the grid and the objects covered by the location-based services. Typically, creating neighborhoods, computing distances between objects in the area, and running private information retrieval protocols causes the CPU to respond slowly with this increase in database size. In order to handle a large number of objects, we explore the concept of kernel and parallel computing in GPU. We develop GPU parallel implementation of the snapshot query to handle large number of objects. In our experiment, we exploit parameter tuning. The results show that with parameter tuning and parallel computing power of GPU we are able to significantly reduce the response time as the number of objects increases. To determine response time of an application without knowledge of the intricacies of GPU architecture, we extend our analysis to predict GPU execution time. We develop the run time equation for an operation and extrapolate the run time for a problem set based on the equation, and then we provide a model to predict GPU response time. As an alternative, the snapshot nearest neighbor query privacy problem can be addressed using secure hardware computing which can eliminate the need for protecting the rest of the sub-system, minimize resource usage and network transmission time. In this approach, a secure coprocessor is used to provide privacy. We process all information inside the coprocessor to deny adversaries access to any private information. To obfuscate access pattern to external memory location, we use oblivious random access memory methodology to access the server. Experimental evaluation shows that using a secure coprocessor reduces resource usage and query response time as the size of the coverage area and objects increases. Second, we address privacy concerns in the continuous nearest neighbor query model where location-based services automatically respond to a change in object*s location. In this model, we present solutions for two different types known as moving query static object and moving query moving object. For the solutions, we propose plane partition using a Voronoi diagram, and a continuous fractal space filling curve using a Hilbert curve order to create a continuous nearest neighbor relationship between the points of interest in a path. Specifically, space filling curve results in multi-dimensional to 1-dimensional object mapping where values are assigned to the objects based on proximity. To prevent subscribers from issuing a query each time there is a change in location and to reduce the response time, we introduce the concept of transition and update time to indicate where and when the nearest neighbor changes. We also introduce a database that dynamically scales with the size of the objects in a path to help obscure and relate objects. By executing the private information retrieval protocol twice on the data, the user secretly retrieves requested information from the database. The results of our experiment show that using plane partitioning and a fractal space filling curve to create nearest neighbor relationships with transition time between objects reduces the total response time.
|
57 |
[en] INTTELIGENT SYSTEM TO SUPPORT BASKETBALL COACHES / [pt] SISTEMA INTELIGENTE DE APOIO A TÉCNICOS DE BASQUETEEDUARDO VERAS ARGENTO 12 September 2024 (has links)
[pt] Em meio ao avanço expressivo da tecnologia e às evoluções contínuas
observadas no ramo de inteligência artificial, esta última se mostrou ter
potencial para ser aplicada a diferentes setores da sociedade. No contexto de
extrema competitividade e relevância crescente nos esportes mais famosos ao
redor do mundo, o basquete se apresenta como um esporte interessante para a
aplicação de mecanismos de apoio à decisão capazes de aumentar a eficácia e
consistência de vitórias dos times nos campeonatos. Diante desse contexto, este
estudo propõe o desenvolvimento de sistemas de apoio à decisão baseados em
modelos de redes neurais e k-Nearest Neighbors (kNNs). O objetivo é avaliar,
para cada substituição durante um jogo de basquete, qual grupo de jogadores
em quadra, conhecido por quinteto, apresenta mais chances de ter uma
maior vantagem sobre o adversário. Para tal, foram treinados modelos para
classificar, ao final de uma sequência de posses de bola, a equipe que conseguiria
vantagem, e prever a magnitude dessa vantagem. A base de dados foi obtida de
partidas do Novo Basquete Brasil (NBB), envolvendo estatísticas de jogadores,
detalhes de jogo e contextos diversos. O modelo apresentou uma acurácia de
76,99 por cento das posses de bola nas projeções de vantagem entre duas equipes em
quadra, demonstrando o potencial da utilização de métodos de inteligência
computacional na tomada de decisões em esportes profissionais. Por fim, o
trabalho ressalta a importância do uso de tais ferramentas em complemento à
experiência humana, instigando pesquisas futuras para o desenvolvimento de
modelos ainda mais sofisticados e eficazes na tomada de decisões no âmbito
esportivo. / [en] In light of the recent significant growth in technological capabilities andthe observed advancements in the field of computational intelligence, the latterhas demonstrated potential for application in various sectors of society. Inthe context of extreme competitiveness and increasing relevance in the mostfamous sports around the world, basketball presents itself as an interestingsport for the application of decision-support mechanisms capable of enhancingthe efficacy and consistency of team victories in championships. In this context,this study proposes the development of decision-support systems, such asneural networks and k-Nearest Neighbors (kNNs). The goal is to evaluate, foreach substitution during a match, which group of players in the field, knownas lineup, presents the most probability to be superior to their opponent. Forthis, models were trained to predict, during a sequence of possessions, theteam that would have advantage and the magnitude of this advantage. Thedatabase was obtained from Novo Basquete Brasil (NBB) matches, involvingplayers statistics, match details and different contexts.. The model achieved anaccuracy of 76,99 percent in projections of superiority between the playing lineups,demonstrating the potential of using computational intelligence methods indecision-making applied to professional sports. Finally, the study highlightsthe importance of using such tools in conjunction with human experience,encouraging future research for the development of even more sophisticatedand effective models for decision-making in the sports field.
|
58 |
Nonparametric tests to detect relationship between variables in the presence of heteroscedastic treatment effectsTolos, Siti January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Haiyan Wang / Statistical tools to detect nonlinear relationship between variables are commonly needed in various practices. The first part of the dissertation presents a test of independence between a response variable, either discrete or continuous, and a continuous covariate after adjusting
for heteroscedastic treatment effects. The method first involves augmenting each pair of the data for all treatments with a fixed number of nearest neighbors as pseudo-replicates. A test statistic is then constructed by taking the difference of two quadratic forms. Using such differences eliminate the need to estimate any nonlinear regression function, reducing the
computational time. Although using a fixed number of nearest neighbors poses significant
difficulty in the inference compared to when the number of nearest neighbors goes to infinity, the parametric standardizing rate is obtained for the asymptotic distribution of the proposed test statistics. Numerical studies show that the new test procedure maintains the intended type I error rate and has robust power to detect nonlinear dependency in the presence of outliers. The second part of the dissertation discusses the theory and numerical studies for
testing the nonparametric effects of no covariate-treatment interaction and no main covariate based on the decomposition of the conditional mean of regression function that is potentially nonlinear. A similar test was discussed in Wang and Akritas (2006) for the effects defined through the decomposition of the conditional distribution function, but with the number of pseudo-replicates going to infinity. Consequently, their test statistics have slow convergence
rates and computational speeds. Both test limitations are overcome using new model and
tests. The last part of the dissertation develops theory and numerical studies to test for no covariate-treatment interaction, no simple covariate and no main covariate effects for cases when the number of factor levels and the number of covariate values are large.
|
59 |
Improving the performance of the prediction analysis of microarrays algorithm via different thresholding methods and heteroscedastic modelingSahtout, Mohammad Omar January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Haiyan Wang / This dissertation considers different methods to improve the performance of the Prediction Analysis of Microarrays (PAM). PAM is a popular algorithm for high-dimensional classification. However, it has a drawback of retaining too many features even after multiple runs of the algorithm to perform further feature selection. The average number of selected features is 2611 from the application of PAM to 10 multi-class microarray human cancer datasets. Such a large number of features make it difficult to perform follow up study.
This drawback is the result of the soft thresholding method used in the PAM algorithm and the thresholding parameter estimate of PAM. In this dissertation, we extend the PAM
algorithm with two other thresholding methods (hard and order thresholding) and a deep
search algorithm to achieve better thresholding parameter estimate. In addition to the new proposed algorithms, we derived an approximation for the probability of misclassification for the hard thresholded algorithm under the binary case. Beyond the aforementioned work, this dissertation considers the heteroscedastic case in which the variances for each feature are different for different classes. In the PAM algorithm the variance of the values for each predictor was assumed to be constant across different
classes. We found that this homogeneity assumption is invalid for many features in most data sets, which motivates us to develop the new heteroscedastic version algorithms. The different thresholding methods were considered in these algorithms.
All new algorithms proposed in this dissertation are extensively tested and compared
based on real data or Monte Carlo simulation studies. The new proposed algorithms, in
general, not only achieved better cancer status prediction accuracy, but also resulted in
more parsimonious models with significantly smaller number of genes.
|
60 |
Microeconometric Models with Endogeneity -- Theoretical and Empirical StudiesDong, Yingying January 2009 (has links)
Thesis advisor: Arthur Lewbel / This dissertation consists of three independent essays in applied microeconomics and econometrics. Essay 1 investigates the issue why individuals with health insurance use more health care. One obvious reason is that health care is cheaper for the insured. But additionally, having insurance can encourage unhealthy behavior via moral hazard. The effect of health insurance on medical utilization has been extensively studied; however, previous work has mostly ignored the effect of insurance on behavior and how that in turn affects medical utilization. This essay examines these distinct effects. The increased medical utilization due to reduced prices may help the insured maintain good health, while that due to increased unhealthy behavior does not, so distinguishing these two effects has important policy implications. A two-period dynamic forward-looking model is constructed to derive the structural causal relationships among the decision to buy insurance, health behaviors (drinking, smoking, and exercise), and medical utilization. The model shows how exogenous changes in insurance prices and past behaviors can identify the direct and indirect effects of insurance on medical utilization. An empirical analysis also distinguishes between intensive and extensive margins (e.g., changes in the number of drinkers vs. the amount of alcohol consumed) of the insurance effect, which turns out to be empirically important. Health insurance is found to encourage less healthy behavior, particularly heavy drinking, but this does not yield a short term perceptible increase in doctor or hospital visits. The effects of health insurance are primarily found at the intensive margin, e.g., health insurance may not cause a non-drinker to take up drinking, while it encourages a heavy drinker to drink even more. These results suggest that to counteract behavioral moral hazard, health insurance should be coupled with incentives that target individuals who currently engage in unhealthy behaviors, such as heavy drinkers. Essay 2 examines the effect of repeating kindergarten on the retained children's academic performance. Although most existing research concludes that grade retention generates no benefits for retainees' later academic performance, holding low achieving children back has been a popular practice for decades. Drawing on a recently collected nationally representative data set in the US, this paper estimates the causal effect of kindergarten retention on the retained children's later academic performance. Since children are observed being held back only when they enroll in schools that permit retention, this paper jointly models 1) the decision of entering a school allowing for kindergarten retention, 2) the decision of undergoing a retention treatment in kindergarten, and 3) children's academic performance in higher grades. The retention treatment is modeled as a binary choice with sample selection. The outcome equations are linear regressions including the kindergarten retention dummy as an endogenous regressor with a correlated random coefficient. A control function estimator is developed for estimating the resulting double-hurdle treatment model, which allows for unobserved heterogeneity in the retention effect. As a comparison, a nonparametric bias-corrected nearest neighbor matching estimator is also implemented. Holding children back in kindergarten is found to have positive but diminishing effects on their academic performance up to the third grade. Essay 3 proves the semiparametric identification of a binary choice model having an endogenous regressor without relying on outside instruments. A simple estimator and a test for endogeneity are provided based on this identification. These results are applied to analyze working age male's migration within the US, where labor income is potentially endogenous. Identification relies on the fact that the migration probability among workers is close to linear in age while labor income is nonlinear in age(when both are nonparametrically estimated). Using data from the PSID, this study finds that labor income is endogenous and that ignoring this endogeneity leads to downward bias in the estimated effect of labor income on the migration probability. / Thesis (PhD) — Boston College, 2009. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics.
|
Page generated in 0.0619 seconds