Spelling suggestions: "subject:"fraud detection"" "subject:"braud detection""
41 |
Detecting Fraud in Affiliate Marketing: Comparative Analysis of Supervised Machine Learning AlgorithmsAhlqvist, Oskar January 2023 (has links)
Affiliate marketing has become a rapidly growing part of the digital marketing sector. However, fraud in affiliate marketing raises a serious threat to the trust and financial stability of the involved parties. This thesis investigates the performance of three supervised machine learning algorithms - random forest, logistic regression, and support vector machine in detecting fraud in affiliate marketing. The objective is to answer the following main research question by answering two sub-questions: How much can Random Forest, Logistic Regression, and Support Vector Machine contribute to the detection of fraud in affiliate marketing? 1. How can the models be compared in an experiment? 2. How can they be optimized and applied within an affiliate marketing framework? To answer these questions, a dataset of transaction logs is analyzed in collaboration with an affiliate network company. The machine learning experiment employs k-fold crossvalidation and the Area Under the ROC Curve (AUC-ROC) performance metric to evaluate the effectiveness of the classifiers in distinguishing fraudulent from non-fraudulent transactions. The results indicate that the random forest classifier performs best out of the models, achieving the highest mean AUC of 0.7172. Furthermore, using feature importance analysis demonstrates that each feature category had different impact on the performance of the models. It was discovered that the models computes different feature importance meaning that some features displayed greater influence on specific models. By fine-tuning and optimizing the hyperparameters for each model, it is possible to enhance their performance. Despite certain limitations, such as time constraints, data availability, and security restrictions, this study highlights the potential of supervised machine learning algorithms. Particularly random forest showed to how it could be used to improve fraud detection capabilities in affiliate marketing.The insights contribute to closing the knowledge gap in comparing the effectiveness of various classification methods and practical applications for fraud detection.
|
42 |
Fraud Detection on Unlabeled Data with Unsupervised Machine Learning / Bedrägeridetektering på omärkt data med oövervakad maskininlärningRenström, Martin, Holmsten, Timothy January 2018 (has links)
A common problem in systems handling user interaction was the risk for fraudulent behaviour. As an example, in a system with credit card transactions it could have been a person using a another user's account for purchases, or in a system with advertisment it could be bots clicking on ads. These malicious attacks were often disguised as normal interactions and could be difficult to detect. It was especially challenging when working with datasets that did not contain so called labels, which showed if the data point was fraudulent or not. This meant that there were no data that had previously been classified as fraud, which in turn made it difficult to develop an algorithm that could distinguish between normal and fraudulent behavior. In this thesis, the area of anomaly detection was explored with the intent of detecting fraudulent behavior without labeled data. Three neural network based prototypes were developed in this study. All three prototypes were some sort of variation of autoencoders. The first prototype which served as a baseline was a simple three layer autoencoder, the second prototype was a novel autoencoder which was called stacked autoencoder, the third prototype was a variational autoencoder. The prototypes were then trained and evaluated on two different datasets which both contained non fraudulent and fraudulent data. In this study it was found that the proposed stacked autoencoder architecture achieved better performance scores in recall, accuracy and NPV in the tests that were designed to simulate a real world scenario. / Ett vanligt problem med användares interaktioner i ett system var risken för bedrägeri. För ett system som hanterarade dataset med kreditkortstransaktioner så kunde ett exempel vara att en person använde en annans identitet för kortköp, eller i system som hanterade reklam så skulle det kunna ha varit en automatiserad mjukvara som simulerade interaktioner. Dessa attacker var ofta maskerade som normala interaktioner och kunde därmed vara svåra att upptäcka. Inom dataset som inte har korrekt märkt data så skulle det vara speciellt svårt att utveckla en algoritm som kan skilja på om interaktionen var avvikande eller inte. I denna avhandling så utforskas ämnet att upptäcka anomalier i dataset utan specifik data som tyder på att det var bedrägeri. Tre prototyper av neurala nätverk användes i denna studie som tränades och utvärderades på två dataset som innehöll både data som sade att det var bedrägeri och inte bedrägeri. Den första prototypen som fungerade som en bas var en simpel autoencoder med tre lager, den andra prototypen var en ny autoencoder som har fått namnet staplad autoencoder och den tredje prototypen var en variationell autoencoder. För denna studie så gav den föreslagna staplade autoencodern bäst resultat för återkallelse, noggrannhet och NPV i de test som var designade att efterlikna ett verkligt scenario.
|
43 |
Classification of Financial Transactions using Lightweight Memory Networks / Klassificering av finansiella transaktioner med hjälp av lätta minnesnätverkCui, Zhexin January 2022 (has links)
Various forms of fraud have substantially impacted our lives and caused considerable losses to some people. To reduce these losses, many researchers have devoted themselves to the study of fraud detection. After the development of fraud detection from expert-driven to data-driven systems, the scalability and accuracy of fraud detection have been improved considerably. However, most existing fraud detection methods focus on the feature extraction and classification of a certain transaction, ignoring the temporal and spatial long-term information from accounts. In this work, we propose to address these limitations by employing a lightweight memory network (LiMNet), which is a deep neural network that captures causal relations between temporal interactions. We evaluate our approach on two data sets, the Ether-Fraud dataset, and the Elliptic dataset. The former is a brand new dataset collected from Etherscan with data mining, and the latter is published by the homonymous company. As a set of raw collected data never used before, the Ether-Fraud dataset had some issues, such as huge variation among values and incomplete information. Therefore we have processed Ether-Fraud with data supplementation and normalization, which has solved these problems. A series of experiments were designed based on our analysis of the model and helped us to find the best hyper-parameter setting. Then, we compared the performance of the model with other baselines, and the results showed that Lightweight Memory Network (LiMNet) outperformed traditional algorithms on the Ether-Fraud dataset but was not good as the graph-based method on the Elliptic dataset. Finally, we summarized the experience of applying the model to fraud detection, the strengths and weaknesses of the model, and future directions for improvement. / Olika former av bedrägerier har haft en betydande inverkan på våra liv och har orsakat stora förluster för vissa människor. För att minska dessa förluster har många forskare ägnat sig åt att studera upptäckt av bedrägerier. Efter utvecklingen av bedrägeriutredningen från expertdrivna till datadrivna system har skalbarheten och noggrannheten förbättrats avsevärt. De flesta av de befintliga metoderna för upptäckt av bedrägerier fokuserar dock på utvinning av funktioner och klassificering av en viss transaktion och ignorerar den temporala och spatiala långsiktiga informationen från konton. I det här arbetet föreslår vi att vi tar itu med dessa begränsningar genom att använda ett lättviktigt minnesnätverk (LiMNet), som är ett djupt neuralt nätverk som fångar kausala relationer mellan temporala interaktioner. Vi utvärderar vårt tillvägagångssätt på två datamängder, datamängden Ether-Fraud och Elliptic-datamängden. Det förstnämnda är ett helt nytt dataset som samlats in från Etherscan med hjälp av datautvinning, och det sistnämnda är publicerat av det homonyma företaget. Eftersom det rörde sig om råa insamlade data som aldrig använts tidigare hade Ether-Fraud-datasetet vissa problem, t.ex. en stor variation mellan värdena och ofullständig information. Därför har vi bearbetat Ether-Fraud med datatillägg och normalisering, vilket har löst dessa problem. En serie experiment utformades utifrån vår analys av modellen och hjälpte oss att hitta den bästa inställningen av hyperparametrar. Sedan jämförde vi modellens prestanda med andra baslinjer, resultaten visade att LiMNet överträffade traditionella algoritmer på datasetet Ether-Fraud men var inte lika bra som den grafbaserade metoden på datasetet Elliptic. Slutligen sammanfattade vi erfarenheterna av att tillämpa modellen på bedrägeridetektion, modellens styrkor och svagheter samt framtida riktningar för förbättringar.
|
44 |
Telecom Fraud Detection Using Machine LearningXiong, Chao January 2022 (has links)
International Revenue Sharing Fraud (IRSF) is one of the most persistent types of fraud within the telecommunications industry. According to the 2017 Communications Fraud Control Association (CFCA) fraud loss survey, IRSF costs 6 billion dollars a year. Therefore, the detection of such frauds is of vital importance to avoid further loss. Though many efforts have been made, very few utilize the temporal patterns of phone call traffic. This project, supported with Sinch’s real production data, aims to exploit both spatial and temporal patterns learned by Graph Attention Neural network (GAT) with Gated Recurrent Unit (GRU) to find suspicious timestamps in the historical traffic. Moreover, combining with the time-independent Isolation forest model, our model should give better results for the phone call records. This report first explains the mechanism of IRSF in detail and introduces the models that are applied in this project, including GAT, GRU, and Isolation forest. Finally, it presents how our experiments have been conducted and the results with extensive analysis. Moreover, we have achieved 42.4% precision and 96.1% recall on the test data provided by Sinch, showing significant advantages over both previous work and baselines. / International Revenue Sharing Fraud (IRSF) är en av de mest ihållande typerna av bedrägerier inom telekommunikationsindustrin. Enligt 2017 Communications Fraud Control Association (CFCA) bedrägeriförlustundersökning kostar IRSF 6 miljarder dollar per år. Därför är upptäckten av sådana bedrägerier av avgörande betydelse för att undvika ytterligare förluster. Även om många ansträngningar har gjorts är det väldigt få som använder telefonsamtalstrafikens tidsmässiga mönster. Detta projekt, med stöd av Sinchs verkliga produktionsdata, syftar till att utnyttja både rumsliga och tidsmässiga mönster som lärts in av Graph Attention Neural Network (GAT) med Gated Recurrent Unit (GRU) för att hitta misstänkt tid i den historiska trafiken. Dessutom, i kombination med den tidsoberoende skogsmodellen Isolation, borde vår modell ge bättre resultat för telefonsamtalsposterna. Denna rapport förklarar först mekanismen för IRSF i detalj och introducerar modellerna som används i detta projekt, inklusive GAT, GRU och Isolation forest. Slutligen presenteras hur våra experiment har genomförts och resultaten med omfattande analys. Dessutom har vi uppnått 42.4% precision och 96.1% återkallelse på testdata från Sinch, vilket visar betydande fördelar jämfört med både tidigare arbete och baslinjer.
|
45 |
Bayesian Variable Selection with Shrinkage Priors and Generative Adversarial Networks for Fraud DetectionIssoufou Anaroua, Amina 01 January 2024 (has links) (PDF)
This research paper focuses on fraud detection in the financial industry using Generative Adversarial Networks (GANs) in conjunction with Uni and Multi Variate Bayesian Model with Shrinkage Priors (BMSP). The problem addressed is the need for accurate and advanced fraud detection techniques due to the increasing sophistication of fraudulent activities. The methodology involves the implementation of GANs and the application of BMSP for variable selection to generate synthetic fraud samples for fraud detection using the augmented dataset. Experimental results demonstrate the effectiveness of the BMSP GAN approach in detecting fraud with improved performance compared to other methods. The conclusions drawn highlight the potential of GANs and BMSP for enhancing fraud detection capabilities and suggest future research directions for further improvements in the field.
|
46 |
Detecting fraud in cellular telephone networksVan Heerden, Johan H. 12 1900 (has links)
Thesis (MSc)--University of Stellenbosch, 2005. / ENGLISH ABSTRACT: Cellular network operators globally loose between 3% and 5% of their annual revenue to
telecommunications fraud. Hence it is of great importance that fraud management systems
are implemented to detect, alarm, and shut down fraud within minutes, minimising
revenue loss. Modern proprietary fraud management systems employ (i) classification
methods, most often artificial neural networks learning from classified call data records to
classify new call data records as fraudulent or legitimate, (ii) statistical methods building
subscriber behaviour profiles based on the subscriber’s usage in the cellular network and
detecting sudden changes in behaviour, and (iii) rules and threshold values defined by
fraud analysts, utilising their knowledge of valid fraud cases and the false alarm rate as
guidance. The purpose of this thesis is to establish a context for and evaluate the performance
of well-known data mining techniques that may be incorporated in the fraud
detection process.
Firstly, a theoretical background of various well-known data mining techniques is
provided and a number of seminal articles on fraud detection, which influenced this thesis,
are summarised. The cellular telecommunications industry is introduced, including a brief
discussion of the types of fraud experienced by South African cellular network operators.
Secondly, the data collection process and the characteristics of the collected data are
discussed. Different data mining techniques are applied to the collected data, demonstrating
how user behaviour profiles may be built and how fraud may be predicted. An
appraisal of the performances and appropriateness of the different data mining techniques
is given in the context of the fraud detection process.
Finally, an indication of further work is provided in the conclusion to this thesis, in
the form of a number of recommendations for possible adaptations of the fraud detection
methods, and improvements thereof. A combination of data mining techniques that may
be used to build a comprehensive fraud detection model is also suggested. / AFRIKAANSE OPSOMMING: Sellulêre netwerk operateurs verloor wêreldwyd tussen 3% en 5% van hul jaarlikse inkomste
as gevolg van telekommunikasie bedrog. Dit is dus van die uiterse belang dat bedrog
bestuurstelsels geïmplimenteer word om bedrog op te spoor, alarms te genereer, en bedrog
binne minute te staak om verlies aan inkomste tot ’n minimum te beperk. Moderne
gepatenteerde bedrog bestuurstelsels maak gebruik van (i) klassifikasie metodes, mees
dikwels kunsmatige neurale netwerke wat leer vanaf geklassifiseerde oproep rekords en
gebruik word om nuwe oproep rekords as bedrog-draend of nie bedrog-draend te klassifiseer,
(ii) statistiese metodes wat gedragsprofiele van ’n intekenaar bou, gebaseer op die
intekenaar se gedrag in die sellulêre netwerk, en skielike verandering in gedrag opspoor,
en (iii) reëls en drempelwaardes wat deur bedrog analiste daar gestel word, deur gebruik
te maak van hulle ondervinding met geldige gevalle van bedrog en die koers waarteen
vals alarms gegenereer word. Die doel van hierdie tesis is om ’n konteks te bepaal vir
en die werksverrigting te evalueer van bekende data ontginningstegnieke wat in bedrog
opsporingstelsels gebruik kan word.
Eerstens word ’n teoretiese agtergrond vir ’n aantal bekende data ontginningstegnieke
voorsien en ’n aantal gedagteryke artikels wat oor bedrog opsporing handel en wat hierdie
tesis beïnvloed het, opgesom. Die sellulêre telekommunikasie industrie word bekend gestel,
insluitend ’n kort bespreking oor die tipes bedrog wat deur Suid-Afrikaanse sellulˆere
telekommunikasie netwerk operateurs ondervind word.
Tweedens word die data versamelingsproses en die eienskappe van die versamelde
data bespreek. Verskillende data ontginningstegnieke word vervolgens toegepas op die
versamelde data om te demonstreer hoe gedragsprofiele van gebruikers gebou kan word
en hoe bedrog voorspel kan word. Die werksverrigting en gepastheid van die verskillende
data ontginningstegnieke word bespreek in die konteks van die bedrog opsporingsproses.
Laastens word ’n aanduiding van verdere werk in die gevolgtrekking tot hierdie tesis
verskaf, en wel in die vorm van ’n aantal aanbevelings oor moontlike aanpassings en verbeterings
van die bedrog opsporingsmetodes wat beskou en toegepas is. ’n Omvattende
bedrog opsporingsmodel wat gebruik maak van ’n kombinasie van data ontginningstegnieke
word ook voorgestel.
|
47 |
信用卡詐欺風險偵測之最佳化模式 – 影響信用卡詐欺風險偵測效率之因素研究 / A Fraud detection model in the credit card risk management process葉慶信, Yeh,Ching Hsin Unknown Date (has links)
自發展塑膠貨幣以來,信用卡的詐欺、盜刷,向來是造成發卡銀行重大損失的風險之一,不僅是當前社會的地下經濟犯罪問題,同時在全球化的過程中,盜刷集團也逐漸發展出跨國分工的犯罪模式,各國發卡銀行在防制信用卡詐欺的策略上,不得不隨之求新求變,以因應盜刷集團詭譎多變的犯罪手法。
在管理詐騙生命週期理論中,偵測是其中極為重要的環節。本篇研究,即是針對如何提升發卡銀行的信用卡詐欺交易偵測效率進行相關研究,藉由相關文獻探討及實務經驗,以歷史資料分析方式,找出影響信用卡詐欺交易偵測效率之因素,來檢視詐欺風險管控措施的成效,同時提供建議,希望找出信用卡詐欺風險偵測之最佳化模式,俾以提升發卡銀行信用卡詐欺交易偵測效率。
本篇研究結果,首先證實在符合筆者所提出的效益/成本的模式下,發送交易警示簡訊,有助於偽卡偵測效率的提升。再者,偵測人員的資歷越深,其偵測效率越高。最後,銀行詐欺交易資訊的交流越密集,越有助於「側錄盜刷」類的詐欺交易偵測效率。
本研究目的除了提出建議以協助銀行減少損失外,盼能藉以喚起社會各界對信用卡詐欺問題之重視,期能遏止當前社會,乃至於國際間信用卡詐欺問題。 / The fraud of credit card usage has caused significant losses in worldwide issuing banks. The fraud syndicates have developed cross-boarder fraudulent activities along with global patners, which not only impacts worldwide economy, but also challenges the strategies initiated by all issuers.
Due to the sophistication of the credit card process, experienced people and informative message are required elements to optimize the fraud detection process. The study attempts to analyze those major factors effecting credit card fraud detection process by adapting empirical analysis method on two real cases.
Major findings are: 1) under a careful cost and benefit calculation, the more the SMS message sending, the higher the fraud detection rate, 2) the more the experience of detectors, the more effective the performance of the detection, and 3) the more the information shared across issuing banks, the higher the detection rate achieved for fraud skimming.
In addition to the optimization of current fraud detection operation, the study is planned to draw public's attention on the serious credit card fraud issue. It is hoped that the attack level of credit card fraud could be mitigated effectively by taking recommendations made by this study.
|
48 |
Detecção de fraudes em transações financeiras via Internet em tempo real. / Frauds detections in financial transactions via Internet in real time.Kovach, Stephan 15 June 2011 (has links)
Um dos objetivos mais importantes de qualquer sistema de detecção de fraudes, independente de seu domínio de operação, é detectar o maior número de fraudes com menor número de alarmes falsos, também denominados de falsos positivos. A existência de falsos positivos é um fato inerente a qualquer sistema de detecção fraudes. O primeiro passo para alcançar esse objetivo é identificar os atributos que podem ser usados para diferenciar atividades legítimas das fraudulentas. O próximo passo consiste em identificar um método para cada atributo escolhido para efetuar essa distinção. A escolha adequada dos atributos e dos métodos correspondentes determina em grande parte o desempenho de um detector de fraudes tanto em termos da relação entre o número de fraudes detectadas e o número de falsos positivos, quanto em termos de tempo de processamento. O desafio desta escolha é maior ao se tratar de um detector de fraudes em tempo real, isto é, fazer a detecção antes que a fraude seja concretizada. O objetivo deste trabalho é apresentar a proposta de uma arquitetura de um sistema de detecção de fraudes em tempo real em transações bancárias via Internet, baseando-se em observações do comportamento local e global de usuários. O método estatístico baseado em análise diferencial é usado para obter a evidência local de uma fraude. Neste caso, a evidência de fraude é baseada na diferença entre os perfis de comportamento atual e histórico do usuário. A evidência local de fraude é fortalecida ou enfraquecida pelo comportamento global do usuário. Neste caso, a evidência de fraude é baseada no número de acessos efetuados em contas diferentes feitos pelo dispositivo utilizado pelo usuário, e por um valor probabilístico que varia com o tempo. A teoria matemática de evidências de Dempster-Shafer é utilizada para combinar estas evidências e obter um escore final. Este escore é então comparado com um limiar para disparar um alarme indicando a fraude. A principal inovação e contribuição deste trabalho estão na definição e exploração dos métodos de detecção baseados em atributos globais que são de natureza específica do domínio de transações financeiras. Os resultados da avaliação utilizando uma base de dados com registros de transações correspondentes a perfis reais de uso demonstraram que a integração de um detector baseado em atributos globais fez aumentar a capacidade do sistema de detectar fraudes em 20%. / One of the most important goals of any fraud detection system, whichever is the domain where it characterizes the possibility for fraud, is to detect the largest number of frauds with fewer false alarms, also denominated false positives. The existence of false positives is a fact inherent to any fraud detection system. The first step in achieving this goal is to identify the attributes that can be used to differentiate between legitimate and fraudulent activities. The next step is to identify a method for each attribute chosen to make this distinction. The proper choice of the attributes and corresponding methods largely determines the performance of a fraud detector, not only in terms of the rate between the number of detected frauds and the number of false positives, but in terms of processing time. The challenge of this choice is higher when dealing with fraud detection in real time, that is, making the detection before the fraud is carried out. The aim of this work is to present the proposal of an architecture of a real time fraud detection system for Internet banking transactions, based on local and global observations of users behavior. The statistical method based on differential analysis is used to obtain the local evidence of fraud. In this case, the evidence of fraud is based on the difference between the current and historical behavior of the user. The frauds local evidence is strengthened or weakened by the users global behavior. In this case, the evidence of fraud is based on the number of accesses performed on different accounts made by the device used by the user and by a probability value that varies over time. The Dempster-Shafers mathematical theory of evidence is applied in order to combine these evidences for final suspicion score of fraud. This score is then compared with a threshold to trigger an alarm indicating the fraud. The main innovation and contribution of this work are the definition and exploration of detection methods based on global attributes which are domain specific of financial transactions. The evaluation results using a database with records of transactions corresponding to actual usage profiles showed that the integration of a detector based on global attributes improves the system capacity to detect frauds in 20%.
|
49 |
Detecção de fraudes em cartões: um classificador baseado em regras de associação e regressão logística / Card fraud detection: a classifier based on association rules and logistic regressionOliveira, Paulo Henrique Maestrello Assad 11 December 2015 (has links)
Os cartões, sejam de crédito ou débito, são meios de pagamento altamente utilizados. Esse fato desperta o interesse de fraudadores. O mercado de cartões enxerga as fraudes como custos operacionais, que são repassados para os consumidores e para a sociedade em geral. Ainda, o alto volume de transações e a necessidade de combater as fraudes abrem espaço para a aplicação de técnicas de Aprendizagem de Máquina; entre elas, os classificadores. Um tipo de classificador largamente utilizado nesse domínio é o classificador baseado em regras. Entretanto, um ponto de atenção dessa categoria de classificadores é que, na prática, eles são altamente dependentes dos especialistas no domínio, ou seja, profissionais que detectam os padrões das transações fraudulentas, os transformam em regras e implementam essas regras nos sistemas de classificação. Ao reconhecer esse cenário, o objetivo desse trabalho é propor a uma arquitetura baseada em regras de associação e regressão logística - técnicas estudadas em Aprendizagem de Máquina - para minerar regras nos dados e produzir, como resultado, conjuntos de regras de detecção de transações fraudulentas e disponibilizá-los para os especialistas no domínio. Com isso, esses profissionais terão o auxílio dos computadores para descobrir e gerar as regras que embasam o classificador, diminuindo, então, a chance de haver padrões fraudulentos ainda não reconhecidos e tornando as atividades de gerar e manter as regras mais eficientes. Com a finalidade de testar a proposta, a parte experimental do trabalho contou com cerca de 7,7 milhões de transações reais de cartões fornecidas por uma empresa participante do mercado de cartões. A partir daí, dado que o classificador pode cometer erros (falso-positivo e falso-negativo), a técnica de análise sensível ao custo foi aplicada para que a maior parte desses erros tenha um menor custo. Além disso, após um longo trabalho de análise do banco de dados, 141 características foram combinadas para, com o uso do algoritmo FP-Growth, gerar 38.003 regras que, após um processo de filtragem e seleção, foram agrupadas em cinco conjuntos de regras, sendo que o maior deles tem 1.285 regras. Cada um desses cinco conjuntos foi submetido a uma modelagem de regressão logística para que suas regras fossem validadas e ponderadas por critérios estatísticos. Ao final do processo, as métricas de ajuste estatístico dos modelos revelaram conjuntos bem ajustados e os indicadores de desempenho dos classificadores também indicaram, num geral, poderes de classificação muito bons (AROC entre 0,788 e 0,820). Como conclusão, a aplicação combinada das técnicas estatísticas - análise sensível ao custo, regras de associação e regressão logística - se mostrou conceitual e teoricamente coesa e coerente. Por fim, o experimento e seus resultados demonstraram a viabilidade técnica e prática da proposta. / Credit and debit cards are two methods of payments highly utilized. This awakens the interest of fraudsters. Businesses see fraudulent transactions as operating costs, which are passed on to consumers. Thus, the high number of transactions and the necessity to combat fraud stimulate the use of machine learning algorithms; among them, rule-based classifiers. However, a weakness of these classifiers is that, in practice, they are highly dependent on professionals who detect patterns of fraudulent transactions, transform them into rules and implement these rules in the classifier. Knowing this scenario, the aim of this thesis is to propose an architecture based on association rules and logistic regression - techniques studied in Machine Learning - for mining rules on data and produce rule sets to detect fraudulent transactions and make them available to experts. As a result, these professionals will have the aid of computers to discover the rules that support the classifier, decreasing the chance of having non-discovered fraudulent patterns and increasing the efficiency of generate and maintain these rules. In order to test the proposal, the experimental part of the thesis has used almost 7.7 million transactions provided by a real company. Moreover, after a long process of analysis of the database, 141 characteristics were combined using the algorithm FP-Growth, generating 38,003 rules. After a process of filtering and selection, they were grouped into five sets of rules which the biggest one has 1,285 rules. Each of the five sets was subjected to logistic regression, so their rules have been validated and weighted by statistical criteria. At the end of the process, the goodness of fit tests were satisfied and the performance indicators have shown very good classification powers (AUC between 0.788 and 0.820). In conclusion, the combined application of statistical techniques - cost sensitive learning, association rules and logistic regression - proved being conceptually and theoretically cohesive and coherent. Finally, the experiment and its results have demonstrated the technical and practical feasibilities of the proposal.
|
50 |
Detecção de fraudes em transações financeiras via Internet em tempo real. / Frauds detections in financial transactions via Internet in real time.Stephan Kovach 15 June 2011 (has links)
Um dos objetivos mais importantes de qualquer sistema de detecção de fraudes, independente de seu domínio de operação, é detectar o maior número de fraudes com menor número de alarmes falsos, também denominados de falsos positivos. A existência de falsos positivos é um fato inerente a qualquer sistema de detecção fraudes. O primeiro passo para alcançar esse objetivo é identificar os atributos que podem ser usados para diferenciar atividades legítimas das fraudulentas. O próximo passo consiste em identificar um método para cada atributo escolhido para efetuar essa distinção. A escolha adequada dos atributos e dos métodos correspondentes determina em grande parte o desempenho de um detector de fraudes tanto em termos da relação entre o número de fraudes detectadas e o número de falsos positivos, quanto em termos de tempo de processamento. O desafio desta escolha é maior ao se tratar de um detector de fraudes em tempo real, isto é, fazer a detecção antes que a fraude seja concretizada. O objetivo deste trabalho é apresentar a proposta de uma arquitetura de um sistema de detecção de fraudes em tempo real em transações bancárias via Internet, baseando-se em observações do comportamento local e global de usuários. O método estatístico baseado em análise diferencial é usado para obter a evidência local de uma fraude. Neste caso, a evidência de fraude é baseada na diferença entre os perfis de comportamento atual e histórico do usuário. A evidência local de fraude é fortalecida ou enfraquecida pelo comportamento global do usuário. Neste caso, a evidência de fraude é baseada no número de acessos efetuados em contas diferentes feitos pelo dispositivo utilizado pelo usuário, e por um valor probabilístico que varia com o tempo. A teoria matemática de evidências de Dempster-Shafer é utilizada para combinar estas evidências e obter um escore final. Este escore é então comparado com um limiar para disparar um alarme indicando a fraude. A principal inovação e contribuição deste trabalho estão na definição e exploração dos métodos de detecção baseados em atributos globais que são de natureza específica do domínio de transações financeiras. Os resultados da avaliação utilizando uma base de dados com registros de transações correspondentes a perfis reais de uso demonstraram que a integração de um detector baseado em atributos globais fez aumentar a capacidade do sistema de detectar fraudes em 20%. / One of the most important goals of any fraud detection system, whichever is the domain where it characterizes the possibility for fraud, is to detect the largest number of frauds with fewer false alarms, also denominated false positives. The existence of false positives is a fact inherent to any fraud detection system. The first step in achieving this goal is to identify the attributes that can be used to differentiate between legitimate and fraudulent activities. The next step is to identify a method for each attribute chosen to make this distinction. The proper choice of the attributes and corresponding methods largely determines the performance of a fraud detector, not only in terms of the rate between the number of detected frauds and the number of false positives, but in terms of processing time. The challenge of this choice is higher when dealing with fraud detection in real time, that is, making the detection before the fraud is carried out. The aim of this work is to present the proposal of an architecture of a real time fraud detection system for Internet banking transactions, based on local and global observations of users behavior. The statistical method based on differential analysis is used to obtain the local evidence of fraud. In this case, the evidence of fraud is based on the difference between the current and historical behavior of the user. The frauds local evidence is strengthened or weakened by the users global behavior. In this case, the evidence of fraud is based on the number of accesses performed on different accounts made by the device used by the user and by a probability value that varies over time. The Dempster-Shafers mathematical theory of evidence is applied in order to combine these evidences for final suspicion score of fraud. This score is then compared with a threshold to trigger an alarm indicating the fraud. The main innovation and contribution of this work are the definition and exploration of detection methods based on global attributes which are domain specific of financial transactions. The evaluation results using a database with records of transactions corresponding to actual usage profiles showed that the integration of a detector based on global attributes improves the system capacity to detect frauds in 20%.
|
Page generated in 0.0767 seconds