• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 78
  • 29
  • 21
  • 15
  • 11
  • 9
  • 8
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 208
  • 83
  • 51
  • 42
  • 32
  • 31
  • 30
  • 29
  • 27
  • 26
  • 25
  • 22
  • 22
  • 21
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Dolování sekvenčních vzorů / Sequential Pattern Mining

Tisoň, Zdeněk January 2012 (has links)
This master's thesis is focused on knowledge discovery from databases, especially on methods of mining sequential patterns. Individual methods of mining sequential patterns are described in detail. Further, this work deals with extending the platform Microsoft SQL Server Analysis Services of new mining algorithms. In the practical part of this thesis, plugins for mining sequential patterns are implemented into MS SQL Server. In the last part, these algorithms are compared on different data sets.
152

Teoria da ressonância adaptativa através da linguagem Java para detecção e classificação de e-mails indesejados /

Santos Junior, Carlos Roberto dos. January 2013 (has links)
Orientador: Anna Diva Plasencia Lotufo / Coorientador: Maria do Carmo Gomes da Silveira / Banca: Mara Lúcia Martins Lopes / Banca: Benedito Isaias de Lima Lopes / Resumo: O problema de mensagens não solicitadas pelos usuários em meios de comunicação eletrônica, apesar de ter surgido antes mesmo da popularização da Internet, ainda é um assunto preocupante. Desperdício de largura de banda, perda de tempo, de produtividade e de dados, ou atraso na leitura de e-mails legítimos, são alguns dos problemas que as mensagens não solicitadas, ou Spams, podem causar. Diversas técnicas de filtragem automática de e-mails são apresentadas na literatura, porém muitas destas não oferecem a possibilidade de adaptação, já que o problema em sistemas reais tem como um de seus principais aspectos ser dinâmico, ou seja, mudar constantemente de características com intuito de evadir as técnicas de filtragem. Neste trabalho é desenvolvido um filtro anti-spam utilizando uma técnica de préprocessamento disponível na literatura, no qual os e-mails são submetidos à extração e seleção de características; e uma Rede Neural Artificial baseada na Teoria da Ressonância Adaptativa, para detecção e classificação de Spams. Tais redes neurais possuem grande capacidade de generalização e adaptabilidade, características importantes para um bom desempenho de filtros anti-spam. O modelo proposto neste trabalho é testado a fim de se validar a eficiência do filtro. / Abstract: The problem in receiving non desired messages in electronic communication systems is a very hard task; even it has begun before the popularization of Internet. The problems that these kinds of messages can cause are among others: waste of time, waste of band width, productivity and data or delay in reading the real e-mails. Several e-mail automatic filtering techniques are presented in the literature, however many of them without capacity of adaptation, while the problem in real systems must be dynamical, i.e. avoid filtering techniques. This work develops a SPAM filtering using a pre processing technique available in the literature, where the e-mails are submitted to extract and select the characteristics; and a neural network based on the resonance adaptive theory to detect and classify the SPAMS. These neural networks have capacity in generalization and adaptation, important characteristics of good performance of SPAM filters. The proposed model is submitted to several tests to validate the efficiency of the filter. / Mestre
153

How does toxicity change depending on rank in League of Legends?

Herner, William, Leiman, Edward January 2019 (has links)
This thesis aims to investigate toxic remarks in three different ranks in League of Legends, Bronze, Gold, and Diamond. The purpose is to understand how toxic communication between players would change depending on rank. A framework from Neto, Alvino and Becker (2018) was adopted to define and count toxic remarks. The method relied on participant observation to gather data; three different ranks were specified for data collection. Fifteen games were played in each of the ranks; Bronze, Gold, and Diamond. Each game was recorded, transcribed and analyzed by dividing each toxic remark registered into Neto, Alvino and Becker’s predetermined categories. The study concluded that domain language is more often used by players with a higher rank, meaning that high ranked players tend to use toxicity that requires previous game knowledge to understand. On the contrary, low ranked players tend to stick to basic complaints and insults when using toxicity to remark teammates while playing. / Syftet med detta examensarbete är att undersöka förekomsten av toxiska yttranden i tre olika ranger i League of Legends: Brons, Guld och Diamant. Målet är att försöka förstå hur toxiska yttranden spelarna emellan ändras beroende på rang. För att kunna definiera och räkna toxiska yttranden användes ett ramverk som utformats av Neto, Alvino och Becker (2018). Som metod för insamlingen av data från de tre olika rangerna användes deltagarobservationer. Femton matcher spelades i var och en av rangerna Brons, Guld och Diamant. Varje match spelades in, transkriberades och analyserades och de toxiska yttrandena delades upp i Neto och Beckers olika kategorier. Utifrån studien kan slutsatsen dras att domänspråk är oftare använt av spelare i högre ranger och att domänspråk är kopplat till slang inom spel som kräver tidigare kunskap i spelet för att förstå. I motsats till detta använder spelare i lägre ranger mer basala klagomål och förolämpningar när toxiska yttranden riktas mot andra spelare.
154

Spamerkennung mit Support Vector Machines

Möller, Manuel 22 June 2005 (has links) (PDF)
Diese Arbeit zeigt ausgehend von einer Darstellung der theoretischen Grundlagen automatischer Textklassifikation, dass die aus der Statistical Learning Theory stammenden Support Vector Machines geeignet sind, zu einer präziseren Erkennung unerwünschter E-Mail-Werbung beizutragen. In einer Testumgebung mit einem Corpus von 20 000 E-Mails wurden Testläufe verschiedene Parameter der Vorverarbeitung und der Support Vector Machine automatisch evaluiert und grafisch visualisiert. Aufbauend darauf wird eine Erweiterung für die Open-Source-Software SpamAssassin beschrieben, die die vorhandenen Klassifikationsmechanismen um eine Klassifikation per Support Vector Machine erweitert.
155

數位時代下垃圾訊息法制之建置---以美國法為藍本

蔡欣惠, Tsai, Hsin-huei Unknown Date (has links)
當您看到此份研究計畫書時,五分鐘內可能您的e-mail郵箱已湧進二十封垃圾郵件(通稱SPAM)。據Ferris Research指出,社會花費在圍堵垃圾郵件的成本開銷上一年高達一百億美元。而根據聯合國國際電信聯盟(International Telecommunication Union,ITU)統計, Spam每年更浪費全球各國250億美元。這個驚人的數據傳達出一個訊息:對多數人而言-聽到「You've Got Mail!」,已經不再是令人愉悅的聲音了。Spamhaus的調查報告顯示 ,台灣及HINET一直是垃圾郵件主要輸出來源,過去AOL曾封鎖由HINET 寄送的郵件,一度造成台灣HINET使用者相當大的困擾。隨著數位匯流(Digital Convergence)時代的來臨,除了Email Spam外,電話行銷、Mobile Spam、SMS簡訊SPAM及VoIP都是數位匯流時代下垃圾郵件客攻掠的戰場,而我國行政院所草擬之「濫發商業電子郵件管理條例」草案明文只規範垃圾「郵件」問題,而未及其它垃圾訊息,法律若未對此議題及早規範,可能草案還沒出立法院大門就已經被時代淘汰。 因此,本文欲針對數位時代下可能興起之垃圾訊息型態作全面性的檢討,以建構一更為完善的垃圾訊息法制已未雨綢繆。本文之研究方法如下: 第一,針對美國之垃圾訊息法制的內容與立法背景,進行比較法研究。台灣的濫發商業電子郵件管制條例草案,內容主要係參考美國法,但在若干立法例仍有所不同,例如於是否需要標示主旨欄(Subject Line Labeling)則有不同選擇。對此,筆者曾於在律師雜誌發表對美國聯邦貿易委員會(FTC)研究報告反對強制行銷業者寄送廣告信必需標示主旨欄的不同意見,且建議台灣的「濫發商業電子郵件管制條例」草案做相反規定 。此外,在處罰對象及門檻之設計亦大相逕庭,例如沒有刑罰規定。而在於規制主體上,我國草案的內容明文只限於垃圾『郵件』之規範,對於日益惡化的的新型態未經邀約的商業訊息,像是透過無線傳輸設備或是行動設備所接收的未經邀約之商業訊息等,草案並無法可管。因此,本文並將針對垃圾電子郵件以外的其它新興垃圾訊息法制進行說明與分析,以供未來立法及執法的參考。 再者,本文將藉由國內外的實務案例研究了解法律實際操作情況。因為台灣法律目尚前無法處理垃圾郵件這個新興法律問題,導致檢察官無法可用,之後通過草案亦可能會面臨到法律不足的問題,因此實有必要針對實際案例操作深入了解。 第三,本文擬就我國之「濫發商業電子郵件管理條例草案」內容進行通盤檢討,提出更符合數位匯流時代之法制架構,以更有效防堵垃圾訊息。美國史丹佛教授Dr. Dan Boneh在「the Difficulties of Tracing Spam Email」 一文中提及垃圾郵件客技術日新月異,防不勝防。可預見SPAM這個議題將隨著科技演進而日益嚴重。隨著數位匯流(Digital Convergence)時代的來臨,除了垃圾郵件外,電話行銷、行動簡訊(Mobile Spam)、簡短訊息服務SMS 及網路語音(Voice over Internet Protocal)、多媒體圖片訊息(MMS)都是數位匯流時代下垃圾郵件客攻掠的戰場,實有必要針對此些新興類型之Spam進行防範。 / Within the five minutes it takes for you to read this essay, your e-mail box may have already received 20 spam mails. Ferris Research has pointed out that the costs incurred to society in blocking spam has reached US$10 billion per year. And according to International Telecommunication Union (ITU), the annual global cost of spam is US$2.5 billion. These startling figures convey a bit of information: for most people, the message “You’ve got mail!” is no longer welcome.. Based on a survey conducted by Spamhaus, Taiwan is a leading source of spam messages. AOL once blocked all e-mail messages coming from Hinet, which at one blow caused huge difficulties for Taiwanese Internet users. With the coming of the Digital Convergence era, besides e-mail spam, new forms are emerging such as mobile spam, telemarketing calls, SMS messaging spam, and VoIP spam. The Digital Convergence era will provide all kinds of opportunities for spammers to attack. However, Taiwan’s draft Anti-UCE Act addresses only e-mail spam. If the law does not address the broader issue early on, it may be outmoded even before it is passed. The US remains the main source of reference for Taiwan in the area of technology law. Long ago, before the US enacted the “Can-Spam Act,” there was “Shiksaa.” I would like to do in-depth research on American cyber and technology law so I can develop a suitable legal solution to Taiwan’s very serious UCE problem, to reduce the losses to society and to business productivity that are caused by spam, to eliminate Taiwan’s bad reputation for being a main spam exporter, and to spur e-commerce development. My research project would be as follows. 1. To examine the inner traits of various SPAM regulation and do interdisciplinary research 2. Deploy case-based and comparative law study to gather practical material 3. Combine the research results from technology and law to contribute to the ultimate resolution of SPAM.
156

Teoria da ressonância adaptativa através da linguagem Java para detecção e classificação de e-mails indesejados

Santos Junior, Carlos Roberto dos [UNESP] 28 February 2013 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:22:34Z (GMT). No. of bitstreams: 0 Previous issue date: 2013-02-28Bitstream added on 2014-06-13T19:28:06Z : No. of bitstreams: 1 santosjunior_cr_me_ilha.pdf: 674616 bytes, checksum: 0eb6d5afdb466f77cd53efea2c4e4db7 (MD5) / O problema de mensagens não solicitadas pelos usuários em meios de comunicação eletrônica, apesar de ter surgido antes mesmo da popularização da Internet, ainda é um assunto preocupante. Desperdício de largura de banda, perda de tempo, de produtividade e de dados, ou atraso na leitura de e-mails legítimos, são alguns dos problemas que as mensagens não solicitadas, ou Spams, podem causar. Diversas técnicas de filtragem automática de e-mails são apresentadas na literatura, porém muitas destas não oferecem a possibilidade de adaptação, já que o problema em sistemas reais tem como um de seus principais aspectos ser dinâmico, ou seja, mudar constantemente de características com intuito de evadir as técnicas de filtragem. Neste trabalho é desenvolvido um filtro anti-spam utilizando uma técnica de préprocessamento disponível na literatura, no qual os e-mails são submetidos à extração e seleção de características; e uma Rede Neural Artificial baseada na Teoria da Ressonância Adaptativa, para detecção e classificação de Spams. Tais redes neurais possuem grande capacidade de generalização e adaptabilidade, características importantes para um bom desempenho de filtros anti-spam. O modelo proposto neste trabalho é testado a fim de se validar a eficiência do filtro. / The problem in receiving non desired messages in electronic communication systems is a very hard task; even it has begun before the popularization of Internet. The problems that these kinds of messages can cause are among others: waste of time, waste of band width, productivity and data or delay in reading the real e-mails. Several e-mail automatic filtering techniques are presented in the literature, however many of them without capacity of adaptation, while the problem in real systems must be dynamical, i.e. avoid filtering techniques. This work develops a SPAM filtering using a pre processing technique available in the literature, where the e-mails are submitted to extract and select the characteristics; and a neural network based on the resonance adaptive theory to detect and classify the SPAMS. These neural networks have capacity in generalization and adaptation, important characteristics of good performance of SPAM filters. The proposed model is submitted to several tests to validate the efficiency of the filter.
157

Classificação de conteúdo malicioso baseado em Floresta de Caminhos Ótimos / Malicious content classification based on Optimum-path Forest

Fernandes, Dheny [UNESP] 19 May 2016 (has links)
Submitted by DHENY FERNANDES null (dfernandes@fc.unesp.br) on 2016-06-15T17:19:42Z No. of bitstreams: 1 Dissertação.pdf: 1456402 bytes, checksum: 56f028f949d37b33c377e1c247b0fd43 (MD5) / Approved for entry into archive by Ana Paula Grisoto (grisotoana@reitoria.unesp.br) on 2016-06-21T17:18:53Z (GMT) No. of bitstreams: 1 fernandes_d_me_bauru.pdf: 1456402 bytes, checksum: 56f028f949d37b33c377e1c247b0fd43 (MD5) / Made available in DSpace on 2016-06-21T17:18:53Z (GMT). No. of bitstreams: 1 fernandes_d_me_bauru.pdf: 1456402 bytes, checksum: 56f028f949d37b33c377e1c247b0fd43 (MD5) Previous issue date: 2016-05-19 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O advento da Internet trouxe amplos benefícios nas áreas de comunicação, entretenimento, compras, relações sociais, entre outras. Entretanto, várias ameaças começaram a surgir nesse cenário, levando pesquisadores a criar ferramentas para lidar com elas. Spam, malwares, con- teúdos maliciosos, pishing, fraudes e falsas URLs são exemplos de ameaças. Em contrapartida, sistemas antivírus, firewalls e sistemas de detecção e prevenção de intrusão são exemplos de ferramentas de combate às tais ameaças. Principalmente a partir de 2010, encabeçado pelo malware Stuxnet, as ameaças tornaram-se muito mais complexas e persistentes, fazendo com que as ferramentas até então utilizadas se tornassem obsoletas. O motivo é que tais ferra- mentas, baseadas em assinaturas e anomalias, não conseguem acompanhar tanto a velocidade de desenvolvimento das ameaças quanto sua complexidade. Desde então, pesquisadores têm voltado suas atenções a métodos mais eficazes para se combater ciberameaças. Nesse contexto, algoritmos de aprendizagem de máquina estão sendo explorados na busca por soluções que analisem em tempo real ameaças provenientes da internet. Assim sendo, este trabalho tem como objetivo analisar o desempenho dos classificadores baseados em Floresta de Caminhos Ótimos, do inglês Optimum-path Forest (OPF), comparando-os com os demais classificadores do estado-da-arte. Para tanto, serão analisados dois métodos de extração de características: um baseado em tokens e o outro baseado em Ngrams, sendo N igual a 3. De maneira geral, o OPF mais se destacou no não bloqueio de mensagens legítimas e no tempo de treinamento. Em algumas bases a quantidade de spam corretamente classificada também foi alta. A versão do OPF que utiliza grafo completo foi melhor, apesar de que em alguns casos a versão com grafo knn se sobressaiu. Devido às exigências atuais em questões de segurança, o OPF, pelo seu rápido tempo de treinamento, pode ser melhorado em sua eficácia visando uma aplicação real. Em relação aos métodos de extração de características, 3gram foi superior, melhorando os resultados obtidos pelo OPF. / The advent of Internet has brought widespread benefits in the areas of communication, entertainment, shopping, social relations, among others. However, several threats began to emerge in this scenario, leading researchers to create tools to deal with them. Spam, malware, malicious content, phishing, fraud and false URLs are some examples of these threats. In contrast, anti-virus systems, firewalls and intrusion detection and prevention systems are examples of tools to combat such threats. Especially since 2010, headed by the Stuxnet malware, threats have become more complex and persistent, making the tools previously used became obsolete. The reason is that such tools based on signatures and anomalies can not follow both the speed of development of the threats and their complexity. Since then, researchers have turned their attention to more effective methods to combat cyber threats. In this context, machine learning algorithms are being exploited in the search for solutions to analyze real-time threats from the internet. Therefore, this study aims to analyze the performance of classifiers based on Optimum-path Forest, OPF, comparing them with the other state-of-the-art classifiers. To do so, two features extraction methods will be analyzed: one based on tokens and other based on Ngrams, considering N equal 3. Overall, OPF stood out in not blocking legitimate messages and training time. In some bases the amount of spam classified correctly was high as well. The version that uses complete graph was better, although in some cases the version that makes use of knn graph outperformed it. Due to the current demands on security issues, OPF, considering its fast training time, can be improved in its effectiveness aiming at a real application. In relation to feature extraction methods, 3gram was better, improving OPF’s results.
158

Deteção de Spam baseada na evolução das características com presença de Concept Drift

Henke, Márcia 30 March 2015 (has links)
Submitted by Geyciane Santos (geyciane_thamires@hotmail.com) on 2015-11-12T20:17:58Z No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-11-16T18:36:36Z (GMT) No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-11-16T18:43:03Z (GMT) No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5) / Made available in DSpace on 2015-11-16T18:43:03Z (GMT). No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5) Previous issue date: 2015-03-30 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Electronic messages (emails) are still considered the most significant tools in business and personal applications due to their low cost and easy access. However, e-mails have become a major problem owing to the high amount of junk mail, named spam, which fill the e-mail boxes of users. Among the many problems caused by spam messages, we may highlight the fact that it is currently the main vector for the spread of malicious activities such as viruses, worms, trojans, phishing, botnets, among others. Such activities allow the attacker to have illegal access to penetrating data, trade secrets or to invade the privacy of the sufferers to get some advantage. Several approaches have been proposed to prevent sending unsolicited e-mail messages, such as filters implemented in e-mail servers, spam message classification mechanisms for users to define when particular issue or author is a source of spread of spam and even filters implemented in network electronics. In general, e-mail filter approaches are based on analysis of message content to determine whether or not a message is spam. A major problem with this approach is spam detection in the presence of concept drift. The literature defines concept drift as changes occurring in the concept of data over time, as the change in the features that describe an attack or occurrence of new features. Numerous Intrusion Detection Systems (IDS) use machine learning techniques to monitor the classification error rate in order to detect change. However, when detection occurs, some damage has been caused to the system, a fact that requires updating the classification process and the system operator intervention. To overcome the problems mentioned above, this work proposes a new changing detection method, named Method oriented to the Analysis of the Development of Attacks Characteristics (MECA). The proposed method consists of three steps: 1) classification model training; 2) concept drift detection; and 3) transfer learning. The first step generates classification models as it is commonly conducted in machine learning. The second step introduces two new strategies to avoid concept drift: HFS (Historical-based Features Selection) that analyzes the evolution of the features based on over time historical; and SFS (Similarity-based Features Selection) that analyzes the evolution of the features from the level of similarity obtained between the features vectors of the source and target domains. Finally, the third step focuses on the following questions: what, how and when to transfer acquired knowledge. The answer to the first question is provided by the concept drift detection strategies that identify the new features and store them to be transferred. To answer the second question, the feature representation transfer approach is employed. Finally, the transfer of new knowledge is executed as soon as changes that compromise the classification task performance are identified. The proposed method was developed and validated using two public databases, being one of the datasets built along this thesis. The results of the experiments shown that it is possible to infer a threshold to detect changes in order to ensure the classification model is updated through knowledge transfer. In addition, MECA architecture is able to perform the classification task, as well as the concept drift detection, as two parallel and independent tasks. Finally, MECA uses SVM machine learning algorithm (Support Vector Machines), which is less adherent to the training samples. The results obtained with MECA showed that it is possible to detect changes through feature evolution monitoring before a significant degradation in classification models is achieved. / As mensagens eletrônicas (e-mails) ainda são consideradas as ferramentas de maior prestígio no meio empresarial e pessoal, pois apresentam baixo custo e facilidade de acesso. Por outro lado, os e-mails tornaram-se um grande problema devido à elevada quantidade de mensagens não desejadas, denominadas spam, que lotam as caixas de emails dos usuários. Dentre os diversos problemas causados pelas mensagens spam, destaca-se o fato de ser atualmente o principal vetor de propagação de atividades maliciosas como vírus, worms, cavalos de Tróia, phishing, botnets, dentre outros. Tais atividades permitem ao atacante acesso indevido a dados sigilosos, segredos de negócios ou mesmo invadir a privacidade das vítimas para obter alguma vantagem. Diversas abordagens, comerciais e acadêmicas, têm sido propostas para impedir o envio de mensagens de e-mails indesejados como filtros implementados nos servidores de e-mail, mecanismos de classificação de mensagens de spam para que os usuários definam quando determinado assunto ou autor é fonte de propagação de spam e até mesmo filtros implementados em componentes eletrônicos de rede. Em geral, as abordagens de filtros de e-mail são baseadas na análise do conteúdo das mensagens para determinar se tal mensagem é ou não um spam. Um dos maiores problemas com essa abordagem é a deteção de spam na presença de concept drift. A literatura conceitua concept drift como mudanças que ocorrem no conceito dos dados ao longo do tempo como a alteração das características que descrevem um ataque ou ocorrência de novas características. Muitos Sistemas de Deteção de Intrusão (IDS) usam técnicas de aprendizagem de máquina para monitorar a taxa de erro de classificação no intuito de detetar mudança. Entretanto, quando a deteção ocorre, algum dano já foi causado ao sistema, fato que requer atualização do processo de classificação e a intervenção do operador do sistema. Com o objetivo de minimizar os problemas mencionados acima, esta tese propõe um método de deteção de mudança, denominado Método orientado à Análise da Evolução das Características de Ataques (MECA). O método proposto é composto por três etapas: 1) treino do modelo de classificação; 2) deteção de mudança; e 3) transferência do aprendizado. A primeira etapa emprega modelos de classificação comumente adotados em qualquer método que utiliza aprendizagem de máquina. A segunda etapa apresenta duas novas estratégias para contornar concept drift: HFS (Historical-based Features Selection) que analisa a evolução das características com base no histórico ao longo do tempo; e SFS (Similarity based Features Selection) que observa a evolução das características a partir do nível de similaridade obtido entre os vetores de características dos domínios fonte e alvo. Por fim, a terceira etapa concentra seu objetivo nas seguintes questões: o que, como e quando transferir conhecimento adquirido. A resposta à primeira questão é fornecida pelas estratégias de deteção de mudança, que identificam as novas características e as armazenam para que sejam transferidas. Para responder a segunda questão, a abordagem de transferência de representação de características é adotada. Finalmente, a transferência do novo conhecimento é realizada tão logo mudanças que comprometam o desempenho da tarefa de classificação sejam identificadas. O método MECA foi desenvolvido e validado usando duas bases de dados públicas, sendo que uma das bases foi construída ao longo desta tese. Os resultados dos experimentos indicaram que é possível inferir um limiar para detetar mudanças a fim de garantir o modelo de classificação sempre atualizado por meio da transferência de conhecimento. Além disso, um diferencial apresentado no método MECA é a possibilidade de executar a tarefa de classificação em paralelo com a deteção de mudança, sendo as duas tarefas independentes. Por fim, o MECA utiliza o algoritmo de aprendizagem de máquina SVM (Support Vector Machines), que é menos aderente às amostras de treinamento. Os resultados obtidos com o MECA mostraram que é possível detetar mudanças por meio da evolução das características antes de ocorrer uma degradação significativa no modelo de classificação utilizado.
159

Categorizing Blog Spam

Bevans, Brandon 01 June 2016 (has links)
The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet. Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to develop more sophisticated detection tools to curb the harmful effects that come with spam. This virtual arms race has no end in sight. Most efforts thus far have been toward accurately detecting spam from ham, and rightfully so since initial detection is essential. However, research is lacking in understanding the current ecosystem of spam, spam campaigns, and the behavior of the botnets that drive the majority of spam traffic. This thesis focuses on characterizing spam, particularly the spam that appears in forums, where the spam is delivered by bots posing as legitimate users. Forum spam is used primarily to push advertisements or to boost other websites’ perceived popularity by including HTTP links in the content of the post. We conduct an experiment to collect a sample of the blog posts and network activity of the spambots that exist in the internet. We then present a corpora available to conduct analysis on and proceed with our own analysis. We cluster associated groups of users and IP addresses into entities, which we accept as a model of the underlying botnets that interact with our honeypots. We use Natural Language Processing (NLP) and Machine Learning (ML) to determine that creating semantic-based models of botnets are sufficient for distinguishing them from one another. We also find that the syntactic structure of posts has little variation from botnet to botnet. Finally we confirm that to a large degree botnet behavior and content hold across different domains.
160

Aplikace Bayesovských sítí / Bayesian Networks Applications

Chaloupka, David January 2013 (has links)
This master's thesis deals with possible applications of Bayesian networks. The theoretical part is mainly of mathematical nature. At first, we focus on general probability theory and later we move on to the theory of Bayesian networks and discuss approaches to inference and to model learning while providing explanations of pros and cons of these techniques. The practical part focuses on applications that demand learning a Bayesian network, both in terms of network parameters as well as structure. These applications include general benchmarks, usage of Bayesian networks for knowledge discovery regarding the causes of criminality and exploration of the possibility of using a Bayesian network as a spam filter.

Page generated in 0.0192 seconds