Global ETD Search

31	Um estudo empírico sobre classificação de símbolos matemáticos manuscritos / An empirical study on handwritten mathematical symbol classication Oliveira, Marcelo Valentim de 25 August 2014 (has links) Um importante problema na área de reconhecimento de padrões é o reconhecimento de textos manuscritos. O problema de reconhecimento de expressões matemáticas manuscritas é um caso particular, que vem sendo tratado por décadas. Esse problema é considerado desafiador devido à grande quantidade de possíveis tipos de símbolos, às variações intrínsecas da escrita, e ao complexo arranjo bidimensional dos símbolos na expressão. Neste trabalho adotamos o problema de reconhecimento de símbolos matemáticos manuscritos para realizar um estudo empírico sobre o comportamento de classificadores multi-classes. Examinamos métodos básicos de aprendizado para classificação multi-classe, especialmente as abordagens um-contra-todos e todos-contra-todos de decomposição de um problema multi-classe em problemas de classificação binária. Para decompor o problema em subproblemas menores, propomos também uma abordagem que utiliza uma árvore de decisão para dividir hierarquicamente o conjunto de dados, de modo que cada subconjunto resultante corresponda a um problema mais simples de classificação. Esses métodos são examinados usando-se como classificador base os modelos de classificação vizinhos-mais-próximos e máquinas de suporte vetorial (usando a abordagem um-contra-todos para combinar os classificadores binários). Para classificação, os símbolos são representados por um conjunto de características conhecido na literatura por HBF49 e que foi proposto recentemente especificamente para problemas de reconhecimento de símbolos on-line. Experimentos foram realizados para avaliar a acurácia dos classificadores, o desempenho dos classificadores para número crescente de classes, tempos de treinamento e teste, e uso de diferentes sub-conjuntos de características. Este trabalho inclui uma descrição dos fundamentos utilizados, detalhes do pré-processamento e extração de características para representação dos símbolos, e uma exposição e discussão sobre o estudo empírico realizado. Os dados adicionais que foram coletados para os experimentos serão publicamente disponibilizados. / An important problem in the eld of Pattern Recognition is handwriting recognition. The problem of handwritten mathematical expression recognition is a particular case that is being studied since decades. This is considered a challenging problem due to the large number of possible mathematical symbols, the intrinsic variation of handwriting, and the complex 2D arrangement of symbols within expressions. In this work we adopt the problem of recognition of online mathematical symbols in order to perform an empirical study on the behavior of multi-class classiers. We examine basic methods for multi-class classification, specially the one-versus-all and all-versus-all approaches for decomposing multi-class problems into a set of binary classification problems. To decompose the problem into smaller ones, we also propose an approach that uses a decision tree to hierarchically divide the whole dataset into subsets, in such a way that each subset corresponds to a simpler classification problem. These methods are examined using the k-nearest-neighbor and, accompanied by the oneversus-all approach, the support vector machine models as base classiers. For classification, symbols are represented through a set of features known in the literature as HBF49 and which has been proposed recently specially for the problem of recognition of online symbols. Experiments were performed in order to evaluate classier accuracy, the performance of the classiers as the number of classes are increased, training and testing time, and the use of dierent subsets of the whole set of features. This work includes a description of the needed background, details of the pre-processing and feature extraction techniques for symbol representation, and an exposition and discussion of the empirical studies performed. The data additionally collected for the experiments will be made publicly available. classicação multi-classe decomposição hierárquica escrita manuscrita grande número de classes hierarquical decomposition large classication problems mathematical symbols multiclass classication on-line handwriting símbolos matemáticos
32	Efficient Kernel Methods For Large Scale Classification Asharaf, S 07 1900 (has links) Classification algorithms have been widely used in many application domains. Most of these domains deal with massive collection of data and hence demand classification algorithms that scale well with the size of the data sets involved. A classification algorithm is said to be scalable if there is no significant increase in time and space requirements for the algorithm (without compromising the generalization performance) when dealing with an increase in the training set size. Support Vector Machine (SVM) is one of the most celebrated kernel based classification methods used in Machine Learning. An SVM capable of handling large scale classification problems will definitely be an ideal candidate in many real world applications. The training process involved in SVM classifier is usually formulated as a Quadratic Programing(QP) problem. The existing solution strategies for this problem have an associated time and space complexity that is (at least) quadratic in the number of training points. This makes the SVM training very expensive even on classification problems having a few thousands of training examples. This thesis addresses the scalability of the training algorithms involved in both two class and multiclass Support Vector Machines. Efficient training schemes reducing the space and time requirements of the SVM training process are proposed as possible solutions. The classification schemes discussed in the thesis for handling large scale two class classification problems are a) Two selective sampling based training schemes for scaling Non-linear SVM and b) Clustering based approaches for handling unbalanced data sets with Core Vector Machine. To handle large scale multicalss classification problems, the thesis proposes Multiclass Core Vector Machine (MCVM), a scalable SVM based multiclass classifier. In MVCM, the multiclass SVM problem is shown to be equivalent to a Minimum Enclosing Ball (MEB) problem and is then solved using a fast approximate MEB finding algorithm. Experimental studies were done with several large real world data sets such as IJCNN1 and Acoustic data sets from LIBSVM page, Extended USPS data set from CVM page and network intrusion detection data sets of DARPA, US Defense used in KDD 99 contest. From the empirical results it is observed that the proposed classification schemes achieve good generalization performance at low time and space requirements. Further, the scalability experiments done with large training data sets have demonstrated that the proposed schemes scale well. A novel soft clustering scheme called Rough Support Vector Clustering (RSVC) employing the idea of Soft Minimum Enclosing Ball Problem (SMEB) is another contribution discussed in this thesis. Experiments done with a synthetic data set and the real world data set namely IRIS, have shown that RSVC finds meaningful soft cluster abstractions. Machine Learning Automatic Classification Kernel Method Classification Algorithms Support Vector Machine (SVM) Core Vector Machine (CVM) Rough Support Vector Clustering (RSVC) Multiclass Core Vector Machine (MCVM) Computer Science
33	Scheduling For Stable And Reliable Communication Over Multiaccess Channels And Degraded Broadcast Channels Kalyanarama Sesha Sayee, KCV 07 1900 (has links) Information-theoretic arguments focus on modeling the reliability of information transmission, assuming availability of infinite data at sources, thus ignoring randomness in message generation times at the respective sources. However, in information transport networks, not only is reliable transmission important, but also stability, i.e., finiteness of mean delay in- curred by messages from the time of generation to the time of successful reception. Usually, delay analysis is done separately using queueing-theoretic arguments, whereas reliable information transmission is studied using information theory. In this thesis, we investigate these two important aspects of data communication jointly by suitably combining models from these two fields. In particular, we model scheduled communication of messages , that arrive in a random process, (i) over multiaccess channels, with either independent decoding or joint decoding, and (ii) over degraded broadcast channels. The scheduling policies proposed permit up to a certain maximum number of messages for simultaneous transmission. In the ﬁrst part of the thesis, we develop a multi-class discrete-time processor-sharing queueing model, and then investigate the stability of this queue. In particular, we model the queue by a discrete-time Markov chain defined on a countable state space, and then establish (i) a sufficient condition for c-regularity of the chain, and hence positive recurrence and finiteness of stationary mean of the function c of the state, and (ii) a sufficient condition for transience of the chain. These stability results form the basis for the conclusions drawn in the thesis. The second part of the thesis is on multiaccess communication with random message arrivals. In the context of independent decoding, we assume that messages can be classified into a fixed number of classes, each of which specifies a combination of received signal power, message length, and target probability of decoding error. Each message is encoded independently and decoded independently. In the context of joint decoding, we assume that messages can be classified into a fixed number of classes, each of which specifies a message length, and for each of which there is a message queue. From each queue, some number of messages are encoded jointly, and received at a signal power corresponding to the queue. The messages are decoded jointly across all queues with a target probability of joint decoding error. For both independent decoding and joint decoding, we derive respective discrete- time multiclass processor-sharing queueing models assuming the corresponding information-theoretic models for the underlying communication process. Then, for both the decoding schemes, we (i) derive respective outer bounds to the stability region of message arrival rate vectors achievable by the class of stationary scheduling policies, (ii) show for any mes- sage arrival rate vector that satisfies the outer bound, that there exists a stationary “state-independent” policy that results in a stable system for the corresponding message arrival process, and (iii) show that the stability region of information arrival rate vectors, in the limit of large message lengths, equals an appropriate information-theoretic capacity region for independent decoding, and equals the information-theoretic capacity region for joint de-coding. For independent decoding, we identify a class of stationary scheduling policies, for which we show that the stability region in the limit of large maximum number of simultane-ous transmissions is independent of the received signal powers, and each of which achieves a spectral efficiency of 1 nat/s/Hz in the limit of large message lengths. In the third and last part of the thesis, we show that the queueing model developed for multiaccess channels with joint decoding can be used to model communication over degraded broadcast channels, with superposition encoding and successive decoding across all queues. We then show respective results (i), (ii), and (iii), stated above. Broadcasting Channels Telecommunication Wireless Communication Systems Information Theory Coding And Decoding Multiaccess Communication Degraded Broadcast Channels Multiaccess Channels Communications Engineering
34	Detecção de fraude em hidrômetros utilizando técnicas de reconhecimento de padrões / Fraud detection in water meters using pattern recognition Detroz, Juliana Patrícia 26 February 2016 (has links) Made available in DSpace on 2016-12-12T20:22:54Z (GMT). No. of bitstreams: 1 Juliana P Detroz.pdf: 11151863 bytes, checksum: f8e2db7d1e13c674adf28e9484a35d9d (MD5) Previous issue date: 2016-02-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / With the emerging hydric crisis, water shortage has been a great global concern. Water supply companies have been increasingly looking for solutions to reduce water wastage and many efforts have been made aiming to promote a better management of this resource. Fraud detection is one of these actions, as the irregular violations are usually held precariously, thus, causing leaks. Hidden and apparent leakage is a major cause of the high water loss rates. In this context, the use of technology in order to automate the identification of potential frauds can be an important support tool to avoid water waste. In this sense, this research aims to apply pattern recognition techniques in the implementation of an automated detection of suspected irregularities cases in water meters, through image analysis. We considered as a potential fraud when there is evidences of violations and seals absences. The proposed computer vision system is composed by three steps: the detection of the water meter location, obtained by OPF classifier and HOG descriptor, detecting the seals through morphological image processing and segmentation methods; and the classification of frauds, in which the condition of the water meter seals is assessed. We validated the proposed framework using a dataset containing images of water meter inspections. The water meter detection solution (HOG+OPF) achieved an average accuracy of 89.03%, showing superior results than SVM (linear and RBF). A comparative analysis of 12 feature descriptors (color and texture) was performed on the classification of the seals condition step. The results of these methods were evaluated individually and also combined, reaching average accuracy up to 81.29%. We concluded that the use of a computer vision system is a promising strategy and has potential to benefit and support the analysis of fraud detection. / Em tempos de racionamento dos recursos hídricos, o desperdício de água tem sido um tema de relevância mundial. Os vazamentos ocultos e aparentes são uma das principais causas dos elevados índices de perdas de água tratada. Esforços são despendidos pelas companhias de saneamento a fim de reduzir as perdas, sendo o combate às fraudes uma destas ações. Neste contexto, o uso da tecnologia para automatizar a identificação de fraude mostra-se uma importante ferramenta de apoio no combate ao desperdício. Esta pesquisa tem como objetivo aplicar técnicas de reconhecimento de padrões na detecção automatizada de casos suspeitos de irregularidades em hidrômetros. No escopo deste trabalho foram consideradas suspeitas de fraude as violações e ausências de lacres. A abordagem proposta visa, através de um sistema de visão computacional, auxiliar no combate a fraudes em hidrômetros e, consequentemente, evitar o desperdício de água associado a estas. Para isto, a execução do sistema proposto é dividida em três etapas: detecção do hidrômetro, fazendo uso do classificador OPF e descritor HOG; a detecção da área estimada dos lacres, obtida pela aplicação de métodos de processamento morfológico e segmentação; e a classificação das fraudes a partir da condição dos lacres do hidrômetro. A validação foi executada utilizando-se um conjunto de imagens de fiscalizações. Na primeira etapa, a solução utilizando o classificador OPF alcançou taxa de acerto média de 89, 03%, sendo superior a resultados dos métodos SVM linear e RBF. Para a classificação da condição dos lacres, realizou-se uma análise comparativa de 12 descritores de imagem, de cor e textura, sendo avaliados os resultados individuais e combinados, atingindo taxas de acerto média de até 81, 29%. Com isto, pode-se concluir que o uso de um sistema especialista de visão computacional para o problema de detecção de fraudes é uma estratégia promissora e com potencial para beneficiar a análise e o suporte à tomada de decisões. Reconhecimento de padrões Detecção de fraudes Visão computacional Classificação multiclasses Pattern recognition Fraud detection Computer vision Multiclass classifier
35	Alguns processos relacionados a modelos de fluxo de tráfego / Some processes related with traffic flow models. Marcio Watanabe Alves de Souza 20 February 2009 (has links) No presente trabalho, estudamos alguns sistemas de partículas interagentes que podem ser vistos como modelos simples de fluxo de tráfego, a saber: O Processo de Hammersley-Aldous-Diaconis e o Processo de Exclusão. Exploramos suas representações como modelos de crescimento no plano. Ênfase é dada aos casos em que há mais de um tipo de partícula, aos processos multiclasses e às suas relações com modelos de filas. Analogia entre os modelos é usada para provar os resultados. Por fim, damos uma nova prova para o cálculo da variância assintótica reescalonada do fluxo de partículas de segunda classe no processo de Hammersley multiclasse em equilíbrio. / In the present work we study the following interacting particle systems which can be seen as simple models of traffic flow: The Hammersley-Aldous-Diaconis Process and the Exclusion Process. We explore the related growth models in the plane. Focus is given to cases where there are more than one kind of particles, to the multitype processes and to their relations with queue models. Analogy between the models is used to prove the results. At last, we give a new proof for the calculation of the asimptotic flux of second class particles in the Multiclass Hammersley process in equilibrium. Filas em série Fluxo de tráfego Processo de exclusão Processo de Hammersley Processos multiclasses Exclusion process Hammersley process Multiclass processes Queues in tandem Traffic flow
36	Desenvolvimento de método simples e rápido para determinação multiclasse de resíduos de medicamentos veterinários em rim, fígado e músculo bovino por UHPLC-MS/MS / Development of simple and quick method for determination multiclasse of veterinary drug residues in kidney, liver and bovine muscle by UHPLC-MS/MS Rizzetti, Tiele Medianeira 10 March 2017 (has links) In food security area, the application of good agricultural practices is a growing concern for public health in the Brazilian domestic market and for the competitiveness countries in the external market. To ensure safety of food from animal origin, monitoring is required and Maximum Residue Limits (MRLs) must be evaluated. Therefore, the development of appropriate analytical methods for residues determination is necessary. In this work a simple, fast and efficient multiclass method of sample preparation was developed for the determination of veterinary drugs residues in bovine kidney, liver and muscle. Determination step was performed by ultra-high-performance liquid chromatographic–tandem mass spectrometry (UHPLC-MS/MS). UHPLC-MS/MS and sample preparation conditions were optimized using experimental designs. Extraction and clean up were performed by solid-liquid extraction and dispersive solid-phase extraction (d-SPE). Central composite designs were used in order to optimize the clean up step. The proposed method was validated using acetonitrile as solvent extraction followed by clean up with EMR-Lipid® sorbent and aqueous solution of 5% trichloroacetic acid (m/v). The proposed method was validated according to the criteria of the European Commission Decision 2002/657/EC. Linearity presented r2 ≥ 0.99 for most the evaluated compounds and recoveries values and RSD in the range recommended by EU. Decision limit (CCα) and detection capability (CCβ) presented values around the maximum residue limits (MRL) of each compound. Monensin was the only compound that did not present satisfactory results for bovine kidney and muscle. The developed sample preparation followed by UHPLC-MS/MS analysis was efficient for the determination of veterinary drug residues in bovine liver, kidney and muscle. The proposed methodology has been successfully applied in real samples and also in proficiency test and proved to be a great option for routine analysis. / No âmbito da segurança dos alimentos a aplicação das boas práticas agropecuárias é uma preocupação crescente tanto para a saúde pública no mercado interno brasileiro quanto para à competitividade do país no mercado externo. Para garantir a inocuidade dos alimentos de origem animal são realizados monitoramentos em diferentes tipos de amostras e adotados Limites Máximos de Resíduo (LMR). Diante disso, se faz necessário o desenvolvimento de métodos analíticos adequados para determinação de resíduos. Neste trabalho desenvolveu-se um método multiclasse de preparo de amostras simples, rápido e eficaz para a determinação de resíduos de medicamentos veterinários em rim, fígado e músculo bovino. A etapa de determinação foi realizada por cromatografia líquida de ultra eficiência acoplada à espectrometria de massas em série (UHPLC-MS/MS). O sistema UHPLC-MS/MS e a etapa de preparo de amostra foram otimizados com auxílio de planejamento experimental. As etapas de extração e limpeza do extrato foram realizadas por extração sólido-líquido e extração em fase sólida dispersiva (d-SPE). Planejamentos composto central foram utilizados para otimização da etapa de limpeza do extrato. O procedimento otimizado consistiu de extração por acetonitrila, seguido de limpeza com o sorvente EMR-Lipid® e solução aquosa de 5% (m/v) ácido tricloroacético. O método proposto foi validado de acordo com os critérios de referência da Decisão 2002/657/CE da Comunidade Europeia. A linearidade apresentou r2 ≥ 0,99 para maioria dos compostos avaliados e os valores de recuperação e RSD estão na faixa recomendada pela Comunidade Europeia. O limite de decisão (CCα) e a capacidade de detecção (CCβ) apresentaram valores em torno dos LMR de cada composto. Apenas a monensina não obteve resultados satisfatórios para rim e músculo bovino. O preparo de amostra desenvolvido seguida de análise por UHPLC-MS/MS foi eficiente para a determinação de resíduos de medicamentos veterinários em rim, fígado e músculo bovino. A metodologia proposta foi aplicada com sucesso em amostras reais e também em ensaio de proficiência e provou ser uma ótima opção para análise de rotina. Multiclasse Medicamento veterinário Planejamento experimental Rim, fígado e músculo bovino EMR-Lipid® UHPLC- MS/MS Multiclass Veterinary drugs Experimental design Bovine kidney Liver and muscle
37	"Investigação de estratégias para a geração de máquinas de vetores de suporte multiclasses" / Investigation of strategies for the generation of multiclass support vector machines Ana Carolina Lorena 16 February 2006 (has links) Diversos problemas envolvem a classificação de dados em categorias, também denominadas classes. A partir de um conjunto de dados cujas classes são conhecidas, algoritmos de Aprendizado de Máquina (AM) podem ser utilizados na indução de um classificador capaz de predizer a classe de novos dados do mesmo domínio, realizando assim a discriminação desejada. Dentre as diversas técnicas de AM utilizadas em problemas de classificação, as Máquinas de Vetores de Suporte (Support Vector Machines - SVMs) se destacam por sua boa capacidade de generalização. Elas são originalmente concebidas para a solução de problemas com apenas duas classes, também denominados binários. Entretanto, diversos problemas requerem a discriminação dos dados em mais que duas categorias ou classes. Nesta Tese são investigadas e propostas estratégias para a generalização das SVMs para problemas com mais que duas classes, intitulados multiclasses. O foco deste trabalho é em estratégias que decompõem o problema multiclasses original em múltiplos subproblemas binários, cujas saídas são então combinadas na obtenção da classificação final. As estratégias propostas visam investigar a adaptação das decomposições a cada aplicação considerada, a partir de informações do desempenho obtido em sua solução ou extraídas de seus dados. Os algoritmos implementados foram avaliados em conjuntos de dados gerais e em aplicações reais da área de Bioinformática. Os resultados obtidos abrem várias possibilidades de pesquisas futuras. Entre os benefícios verificados tem-se a obtenção de decomposições mais simples, que requerem menos classificadores binários na solução multiclasses. / Several problems involve the classification of data into categories, also called classes. Given a dataset containing data whose classes are known, Machine Learning (ML) algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, thus performing the desired discrimination. Among the several ML techniques applied to classification problems, the Support Vector Machines (SVMs) are known by their high generalization ability. They are originally conceived for the solution of problems with only two classes, also named binary problems. However, several problems require the discrimination of examples into more than two categories or classes. This thesis investigates and proposes strategies for the generalization of SVMs to problems with more than two classes, known as multiclass problems. The focus of this work is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are then combined to obtain the final classification. The proposed strategies aim to investigate the adaptation of the decompositions for each multiclass application considered, using information of the performance obtained for its solution or extracted from its examples. The implemented algorithms were evaluated on general datasets and on real applications from the Bioinformatics domain. The results obtained open possibilities of many future work. Among the benefits observed is the obtainment of simpler decompositions, which require less binary classifiers in the multiclass solution. algoritmos genéticos árvores geradoras mínimas Bioinformática problemas multiclasses Bioinformatics genetic algorithms minimum spanning trees multiclass problems support vector machines
38	Um estudo empírico sobre classificação de símbolos matemáticos manuscritos / An empirical study on handwritten mathematical symbol classication Marcelo Valentim de Oliveira 25 August 2014 (has links) Um importante problema na área de reconhecimento de padrões é o reconhecimento de textos manuscritos. O problema de reconhecimento de expressões matemáticas manuscritas é um caso particular, que vem sendo tratado por décadas. Esse problema é considerado desafiador devido à grande quantidade de possíveis tipos de símbolos, às variações intrínsecas da escrita, e ao complexo arranjo bidimensional dos símbolos na expressão. Neste trabalho adotamos o problema de reconhecimento de símbolos matemáticos manuscritos para realizar um estudo empírico sobre o comportamento de classificadores multi-classes. Examinamos métodos básicos de aprendizado para classificação multi-classe, especialmente as abordagens um-contra-todos e todos-contra-todos de decomposição de um problema multi-classe em problemas de classificação binária. Para decompor o problema em subproblemas menores, propomos também uma abordagem que utiliza uma árvore de decisão para dividir hierarquicamente o conjunto de dados, de modo que cada subconjunto resultante corresponda a um problema mais simples de classificação. Esses métodos são examinados usando-se como classificador base os modelos de classificação vizinhos-mais-próximos e máquinas de suporte vetorial (usando a abordagem um-contra-todos para combinar os classificadores binários). Para classificação, os símbolos são representados por um conjunto de características conhecido na literatura por HBF49 e que foi proposto recentemente especificamente para problemas de reconhecimento de símbolos on-line. Experimentos foram realizados para avaliar a acurácia dos classificadores, o desempenho dos classificadores para número crescente de classes, tempos de treinamento e teste, e uso de diferentes sub-conjuntos de características. Este trabalho inclui uma descrição dos fundamentos utilizados, detalhes do pré-processamento e extração de características para representação dos símbolos, e uma exposição e discussão sobre o estudo empírico realizado. Os dados adicionais que foram coletados para os experimentos serão publicamente disponibilizados. / An important problem in the eld of Pattern Recognition is handwriting recognition. The problem of handwritten mathematical expression recognition is a particular case that is being studied since decades. This is considered a challenging problem due to the large number of possible mathematical symbols, the intrinsic variation of handwriting, and the complex 2D arrangement of symbols within expressions. In this work we adopt the problem of recognition of online mathematical symbols in order to perform an empirical study on the behavior of multi-class classiers. We examine basic methods for multi-class classification, specially the one-versus-all and all-versus-all approaches for decomposing multi-class problems into a set of binary classification problems. To decompose the problem into smaller ones, we also propose an approach that uses a decision tree to hierarchically divide the whole dataset into subsets, in such a way that each subset corresponds to a simpler classification problem. These methods are examined using the k-nearest-neighbor and, accompanied by the oneversus-all approach, the support vector machine models as base classiers. For classification, symbols are represented through a set of features known in the literature as HBF49 and which has been proposed recently specially for the problem of recognition of online symbols. Experiments were performed in order to evaluate classier accuracy, the performance of the classiers as the number of classes are increased, training and testing time, and the use of dierent subsets of the whole set of features. This work includes a description of the needed background, details of the pre-processing and feature extraction techniques for symbol representation, and an exposition and discussion of the empirical studies performed. The data additionally collected for the experiments will be made publicly available. classicação multi-classe decomposição hierárquica escrita manuscrita grande número de classes símbolos matemáticos hierarquical decomposition large classication problems mathematical symbols multiclass classication on-line handwriting
39	Detecting Lateral Movement in Microsoft Active Directory Log Files : A supervised machine learning approach Uppströmer, Viktor, Råberg, Henning January 2019 (has links) Cyberattacker utgör ett stort hot för dagens företag och organisationer, med engenomsnittlig kostnad för ett intrång på ca 3,86 miljoner USD. För att minimera kostnaden av ett intrång är det viktigt att detektera intrånget i ett så tidigt stadium som möjligt. Avancerande långvariga hot (APT) är en sofistikerad cyberattack som har en lång närvaro i offrets nätverk. Efter attackerarens första intrång kommer fokuset av attacken skifta till att få kontroll över så många enheter som möjligt på nätverket. Detta steg kallas för lateral rörelse och är ett av de mest kritiska stegen i en APT. Syftet med denna uppsats är att undersöka hur och hur väl lateral rörelse kan upptäckas med hjälp av en maskininlärningsmetod. I undersökningen jämförs och utvärderas fem maskininlärningsalgoritmer med upprepad korsvalidering följt av statistisk testning för att bestämma vilken av algoritmerna som är bäst. Undersökningen konkluderar även vilka attributer i det undersökta datasetet som är väsentliga för att detektera laterala rörelser. Datasetet kommer från en Active Directory domänkontrollant där datasetets attributer är skapade av korrelerade loggar med hjälp av datornamn, IP-adress och användarnamn. Datasetet består av en syntetisk, samt, en verklig del vilket skapar ett semi-syntetiskt dataset som innehåller ett multiklass klassifierings problem. Experimentet konkluderar att all fem algoritmer klassificerar rätt med en pricksäkerhet (accuracy) på 0.998. Algoritmen RF presterar med den högsta f-measure (0.88) samt recall (0.858), SVM är bäst gällande precision (0.972) och DT har denlägsta inlärningstiden (1237ms). Baserat på resultaten indikerar undersökningenatt algoritmerna RF, SVM och DT presterar bäst i olika scenarier. Till exempel kan SVM användas om en låg mängd falsk positiva larm är viktigt. Om en balanserad prestation av de olika prestanda mätningarna är viktigast ska RF användas. Undersökningen konkluderar även att en stor mängd utav de undersökta attributerna av datasetet kan bortses i framtida experiment, då det inte påverkade prestandan på någon av algoritmerna. / Cyber attacks raise a high threat for companies and organisations worldwide. With the cost of a data breach reaching $3.86million on average, the demand is high fora rapid solution to detect cyber attacks as early as possible. Advanced persistent threats (APT) are sophisticated cyber attacks which have long persistence inside the network. During an APT, the attacker will spread its foothold over the network. This stage, which is one of the most critical steps in an APT, is called lateral movement. The purpose of the thesis is to investigate lateral movement detection with a machine learning approach. Five machine learning algorithms are compared using repeated cross-validation followed statistical testing to determine the best performing algorithm and feature importance. Features used for learning the classifiers are extracted from Active Directory log entries that relate to each other, with a similar workstation, IP, or account name. These features are the basis of a semi-synthetic dataset, which consists of a multiclass classification problem. The experiment concludes that all five algorithms perform with an accuracy of 0.998. RF displays the highest f1-score (0.88) and recall (0.858), SVM performs the best with the performance metric precision (0.972), and DT has the lowest computational cost (1237ms). Based on these results, the thesis concludes that the algorithms RF, SVM, and DT perform best in different scenarios. For instance, SVM should be used if a low amount of false positives is favoured. If the general and balanced performance of multiple metrics is preferred, then RF will perform best. The results also conclude that a significant amount of the examined features can be disregarded in future experiments, as they do not impact the performance of either classifier. Advanced Persistent Threat Lateral Movement Active Directory Multiclass Classification Intrusion Detection System Avancerade långvariga hot Lateral rörelse Active Directory Multiklassklassificering Intrångsdetektering Computer Systems Datorsystem
40	Prioritizing Discordant Chronic Comorbidities and Predicting the Medication Using Machine Learning Sharma, Ichchha Pradeep 07 August 2023 (has links) No description available. Computer Science Information Technology Health Care Management Health Care Artificial Intelligence

Search results