• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 34
  • 14
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 74
  • 74
  • 24
  • 22
  • 21
  • 21
  • 15
  • 12
  • 11
  • 11
  • 11
  • 10
  • 10
  • 10
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

A method for measuring Internal Fraud Risk (IFR) of business organisations with ERP systems

Dayan, Imran January 2017 (has links)
ERP system has shaped the way modern organisations design, control, and execute business processes. It has not only paved the way for efficient use of organisational resources but also offered the opportunity to utilise data logged in the system for ensuring internal control. The key contribution of this research is that it has resulted in a method which can practically be employed by internal auditors for measuring internal fraud risk of business organisations with ERP systems, by utilising process mining technique and evidential reasoning in the form of Bayesian theorem, in a much more effective way compared to conventional frequentist method. The other significant contribution is that it has paved the way for combining process mining technique and evidential reasoning in addressing problems prevalent within organisational contexts. This research has contributed in developing IS theories for design and action especially in the area of soft systems methodology as it has relied on business process modelling in addressing the issue of internal fraud risk. The chosen method has contributed in facilitating incorporation of design science method in problem solving. Researchers have focused on applying data mining techniques within organisational contexts for extracting valuable information. Process mining is a comparatively new technique which allows business processes to be analysed based on event logs. Analysis of business processes can be useful for organisations not only for attaining greater efficiency but also for ensuring internal control inside the organisation. Large organisations have various measures in place for ensuring internal control. Measuring the risk of fraud within a business process is an important practice for preventing fraud as accurate measurement of fraud risk provides business experts with the opportunity to comprehend the extent of the problem. Business experts, such as internal auditors, still heavily rely upon conventional methods for measuring internal fraud risk way by of random check of process compliance. Organisations with ERP systems in place can avail themselves of the opportunity to use event logs for extending the scope of assessing process conformance. This has not been put into practice as there is a lack of well researched methods which can allow event logs to be utilised for enhancing internal control. This research can be proved to be useful for practitioners as it has developed a method for measuring internal fraud risk within organisations. This research aimed to utilise process mining technique that allows business experts to exert greater control over business process execution by allowing the internal fraud risk to be measured effectively. A method has been developed for measuring internal fraud risk of business originations with ERP systems by using process mining and Bayesian theorem. In this method, rate of process deviation is calculated by conducting process mining on relevant logs of events and then that process deviation rate is applied in Bayesian theorem along with historic internal fraud risk rate and process deviation rate calculated manually for arriving at a revised internal fraud risk rate. Bayesian theorem has been relied upon for the purpose of developing this new method as it allows evidential reasoning to be incorporated. The method has been developed as a Design Science Research Method (DSRM) artefact by conducting three case-studies. Data has been collected from three case companies, operating in readymade garments manufacturing industry, pharmaceuticals industry, and aviation industry, regarding their procurement process for conducting process mining. The revised internal fraud risk rates were then evaluated by considering the feedback received from respective business experts of each of the case company. The proposed method is beneficial as it has paved the way for practitioners to utilise process mining using a soft system methodology. The developed method is of immense significance as it has contributed in the field of business intelligence and analytics (BI&A) and the big data analytics which have become significantly important to both academics and practitioners over the past couple of decades.
52

Detecção de fraude em hidrômetros utilizando técnicas de reconhecimento de padrões / Fraud detection in water meters using pattern recognition

Detroz, Juliana Patrícia 26 February 2016 (has links)
Made available in DSpace on 2016-12-12T20:22:54Z (GMT). No. of bitstreams: 1 Juliana P Detroz.pdf: 11151863 bytes, checksum: f8e2db7d1e13c674adf28e9484a35d9d (MD5) Previous issue date: 2016-02-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / With the emerging hydric crisis, water shortage has been a great global concern. Water supply companies have been increasingly looking for solutions to reduce water wastage and many efforts have been made aiming to promote a better management of this resource. Fraud detection is one of these actions, as the irregular violations are usually held precariously, thus, causing leaks. Hidden and apparent leakage is a major cause of the high water loss rates. In this context, the use of technology in order to automate the identification of potential frauds can be an important support tool to avoid water waste. In this sense, this research aims to apply pattern recognition techniques in the implementation of an automated detection of suspected irregularities cases in water meters, through image analysis. We considered as a potential fraud when there is evidences of violations and seals absences. The proposed computer vision system is composed by three steps: the detection of the water meter location, obtained by OPF classifier and HOG descriptor, detecting the seals through morphological image processing and segmentation methods; and the classification of frauds, in which the condition of the water meter seals is assessed. We validated the proposed framework using a dataset containing images of water meter inspections. The water meter detection solution (HOG+OPF) achieved an average accuracy of 89.03%, showing superior results than SVM (linear and RBF). A comparative analysis of 12 feature descriptors (color and texture) was performed on the classification of the seals condition step. The results of these methods were evaluated individually and also combined, reaching average accuracy up to 81.29%. We concluded that the use of a computer vision system is a promising strategy and has potential to benefit and support the analysis of fraud detection. / Em tempos de racionamento dos recursos hídricos, o desperdício de água tem sido um tema de relevância mundial. Os vazamentos ocultos e aparentes são uma das principais causas dos elevados índices de perdas de água tratada. Esforços são despendidos pelas companhias de saneamento a fim de reduzir as perdas, sendo o combate às fraudes uma destas ações. Neste contexto, o uso da tecnologia para automatizar a identificação de fraude mostra-se uma importante ferramenta de apoio no combate ao desperdício. Esta pesquisa tem como objetivo aplicar técnicas de reconhecimento de padrões na detecção automatizada de casos suspeitos de irregularidades em hidrômetros. No escopo deste trabalho foram consideradas suspeitas de fraude as violações e ausências de lacres. A abordagem proposta visa, através de um sistema de visão computacional, auxiliar no combate a fraudes em hidrômetros e, consequentemente, evitar o desperdício de água associado a estas. Para isto, a execução do sistema proposto é dividida em três etapas: detecção do hidrômetro, fazendo uso do classificador OPF e descritor HOG; a detecção da área estimada dos lacres, obtida pela aplicação de métodos de processamento morfológico e segmentação; e a classificação das fraudes a partir da condição dos lacres do hidrômetro. A validação foi executada utilizando-se um conjunto de imagens de fiscalizações. Na primeira etapa, a solução utilizando o classificador OPF alcançou taxa de acerto média de 89, 03%, sendo superior a resultados dos métodos SVM linear e RBF. Para a classificação da condição dos lacres, realizou-se uma análise comparativa de 12 descritores de imagem, de cor e textura, sendo avaliados os resultados individuais e combinados, atingindo taxas de acerto média de até 81, 29%. Com isto, pode-se concluir que o uso de um sistema especialista de visão computacional para o problema de detecção de fraudes é uma estratégia promissora e com potencial para beneficiar a análise e o suporte à tomada de decisões.
53

[en] STUDY OF DATA MINING METHODS APPLIED TO THE FINANCIAL MANAGEMENT OF MUNICIPALITIES / [pt] ESTUDO DE MÉTODOS DE MINERAÇÃO DE DADOS APLICADOS À GESTÃO FAZENDÁRIA DE MUNICÍPIOS

WILFREDO MAMANI TICONA 09 October 2018 (has links)
[pt] Os impostos arrecadados pelas prefeituras são revertidos para o bem comum, para investimentos (tais como infraestrutura) e custeio de bens e serviços públicos, como saúde, segurança e educação. A previsão de valores futuros a serem arrecadados é uma das tarefas que as prefeituras têm como desafio. Essa é uma tarefa importante, pois as informações obtidas das previsões são valiosas para dar apoio à decisão com relação ao planejamento estratégico da prefeitura. Sendo assim, a investigação de modelos de previsão de impostos municipais, através de técnicas inteligentes, é de grande importância para a administração municipal. Deste modo, um dos objetivos desta dissertação foi desenvolver dois modelos para previsão de impostos utilizando redes neurais. Um modelo considerando variáveis endógenas e outro considerando variáveis endógenas e exógenas. Outro grande desafio para as prefeituras são as irregularidades no pagamento de tributos (erro ou fraude), que também prejudica o planejamento estratégico. A fiscalização mensal de todos os contribuintes é uma tarefa impossível de se realizar devido à desproporção entre o número de contribuintes e o reduzido número de agentes fiscais. Assim, a investigação de métodos baseados em técnicas inteligentes para indicar os possíveis suspeitos de irregularidade, é importante para o desempenho das atividades do agente fiscal. Deste modo, outro objetivo desta dissertação foi desenvolver um modelo visando identificar possíveis suspeitos de irregularidades no pagamento do ISSQN (Imposto Sobre Serviços de Qualquer Natureza). Os modelos de previsão foram avaliados, com três estudos de caso usando dados do município de Araruama. Para o modelo de previsão utilizando variáveis endógenas utilizou-se dois estudos de caso: o primeiro caso para a previsão de Receitas da Dívida Ativa e o segundo caso para a previsão de Receitas Tributárias, e um terceiro estudo caso para o modelo de previsão do ISSQN, utilizando variáveis endógenas e exógenas. Essas previsões obtiveram resultados, que se julgam promissores, a despeito dos dados utilizados nos estudos de caso. Com relação à irregularidade, apesar de não ter sido possível avaliar os resultados obtidos, entende-se que a ferramenta poderá ser utilizada como indicador para novas diligências. / [en] Taxes collected by city halls are reverted towards common welfare; investments (such as infrastructure), and funding of public goods, as services on health, safety and education. The prediction of tax revenues is one of the tasks that have as challenges the city hall. This is an important task; because the information obtained from these predictions are important to support the city halls with relation the strategic planning. Thus, the investigation of prediction models designed for tax revenues through intelligent techniques is of great importance for public administration. One of the goals of this dissertation was to develop two models to prediction tax revenue using neural networks. The first model was designed considering endogenous variables only. The latter, considered both endogenous and exogenous variables. Another major challenge for city hall are irregularities in the taxes payment (error or fraud), which also affect the strategic planning. A monthly of all taxpayers is an impossible task to accomplish, due to the disproportion between the number of taxpayers and the reduced number of tax agents. Thus, research of methods based on intelligent techniques that indicate possible irregularities, is of great importance for tax agents. This way, another objective of this dissertation was to develop a model to identify possible suspects irregularities in the payment of the ISSQN (tax services of any nature). Prediction models were evaluated with three case studies using data from the city hall of Araruama. For the prediction model using endogenous variable, two case studies we used: (i) active debt revenues prediction, (ii) tax revenues prediction and (iii) ISSQN prediction, the latter using both endogenous and exogenous variables. In spite of the data used in the case studies, the results obtained from modeling are promising. Regarding tax irregularities, even though is not possible to evaluate the obtained results, the developed tool may be used as an indicator for future applications.
54

Redes bayesianas aplicadas à modelagem de fraudes em cartão de crédito

Ramos, Jhonata Emerick 21 August 2015 (has links)
Submitted by Jhonata Ramos (jhonata.emerick@gmail.com) on 2015-09-18T16:47:09Z No. of bitstreams: 1 dissertacao_final.pdf: 820128 bytes, checksum: b7fc5ca71a3debaf99da902b518ff748 (MD5) / Approved for entry into archive by Renata de Souza Nascimento (renata.souza@fgv.br) on 2015-09-18T17:17:43Z (GMT) No. of bitstreams: 1 dissertacao_final.pdf: 820128 bytes, checksum: b7fc5ca71a3debaf99da902b518ff748 (MD5) / Made available in DSpace on 2015-09-18T21:34:00Z (GMT). No. of bitstreams: 1 dissertacao_final.pdf: 820128 bytes, checksum: b7fc5ca71a3debaf99da902b518ff748 (MD5) Previous issue date: 2015-08-21 / For fraud detection models are used to identify whether a transaction is legitimate or fraudulent based on registration and transactional information. The proposal on technical study presented in this thesis consists in the Bayesian Networks (BN); their results were compared to logistic regression technique (RL), widely used by the market. Bayesian classifiers were evaluated, with the Naive Bayes structure. The structures of Bayesian networks were obtained from actual data, provided by a financial institution. The database was divided into samples development and validation by cross validation ten partitions. Naive Bayes classifiers were chosen due to the simplicity and efficiency. The model performance was evaluated taking into account the confusion matrix and the area under the ROC curve. The analyzes of performance models revealed slightly higher than the logistic regression compared to bayesian classifiers. Logistic regression was chosen as the most appropriate model for performed better in predicting fraudulent operations, compared to the confusion matrix. Based on area under the ROC curve, logistic regression demonstrated greater ability to discriminate the operations being classified correctly, those that are not. / Modelos para detecção de fraude são utilizados para identificar se uma transação é legítima ou fraudulenta com base em informações cadastrais e transacionais. A técnica proposta no estudo apresentado, nesta dissertação, consiste na de Redes Bayesianas (RB); seus resultados foram comparados à técnica de Regressão Logística (RL), amplamente utilizada pelo mercado. As Redes Bayesianas avaliadas foram os classificadores bayesianos, com a estrutura Naive Bayes. As estruturas das redes bayesianas foram obtidas a partir de dados reais, fornecidos por uma instituição financeira. A base de dados foi separada em amostras de desenvolvimento e validação por cross validation com dez partições. Naive Bayes foram os classificadores escolhidos devido à simplicidade e a sua eficiência. O desempenho do modelo foi avaliado levando-se em conta a matriz de confusão e a área abaixo da curva ROC. As análises dos modelos revelaram desempenho, levemente, superior da regressão logística quando comparado aos classificadores bayesianos. A regressão logística foi escolhida como modelo mais adequado por ter apresentado melhor desempenho na previsão das operações fraudulentas, em relação à matriz de confusão. Baseada na área abaixo da curva ROC, a regressão logística demonstrou maior habilidade em discriminar as operações que estão sendo classificadas corretamente, daquelas que não estão.
55

Detecção de fraudes em cartões: um classificador baseado em regras de associação e regressão logística / Card fraud detection: a classifier based on association rules and logistic regression

Paulo Henrique Maestrello Assad Oliveira 11 December 2015 (has links)
Os cartões, sejam de crédito ou débito, são meios de pagamento altamente utilizados. Esse fato desperta o interesse de fraudadores. O mercado de cartões enxerga as fraudes como custos operacionais, que são repassados para os consumidores e para a sociedade em geral. Ainda, o alto volume de transações e a necessidade de combater as fraudes abrem espaço para a aplicação de técnicas de Aprendizagem de Máquina; entre elas, os classificadores. Um tipo de classificador largamente utilizado nesse domínio é o classificador baseado em regras. Entretanto, um ponto de atenção dessa categoria de classificadores é que, na prática, eles são altamente dependentes dos especialistas no domínio, ou seja, profissionais que detectam os padrões das transações fraudulentas, os transformam em regras e implementam essas regras nos sistemas de classificação. Ao reconhecer esse cenário, o objetivo desse trabalho é propor a uma arquitetura baseada em regras de associação e regressão logística - técnicas estudadas em Aprendizagem de Máquina - para minerar regras nos dados e produzir, como resultado, conjuntos de regras de detecção de transações fraudulentas e disponibilizá-los para os especialistas no domínio. Com isso, esses profissionais terão o auxílio dos computadores para descobrir e gerar as regras que embasam o classificador, diminuindo, então, a chance de haver padrões fraudulentos ainda não reconhecidos e tornando as atividades de gerar e manter as regras mais eficientes. Com a finalidade de testar a proposta, a parte experimental do trabalho contou com cerca de 7,7 milhões de transações reais de cartões fornecidas por uma empresa participante do mercado de cartões. A partir daí, dado que o classificador pode cometer erros (falso-positivo e falso-negativo), a técnica de análise sensível ao custo foi aplicada para que a maior parte desses erros tenha um menor custo. Além disso, após um longo trabalho de análise do banco de dados, 141 características foram combinadas para, com o uso do algoritmo FP-Growth, gerar 38.003 regras que, após um processo de filtragem e seleção, foram agrupadas em cinco conjuntos de regras, sendo que o maior deles tem 1.285 regras. Cada um desses cinco conjuntos foi submetido a uma modelagem de regressão logística para que suas regras fossem validadas e ponderadas por critérios estatísticos. Ao final do processo, as métricas de ajuste estatístico dos modelos revelaram conjuntos bem ajustados e os indicadores de desempenho dos classificadores também indicaram, num geral, poderes de classificação muito bons (AROC entre 0,788 e 0,820). Como conclusão, a aplicação combinada das técnicas estatísticas - análise sensível ao custo, regras de associação e regressão logística - se mostrou conceitual e teoricamente coesa e coerente. Por fim, o experimento e seus resultados demonstraram a viabilidade técnica e prática da proposta. / Credit and debit cards are two methods of payments highly utilized. This awakens the interest of fraudsters. Businesses see fraudulent transactions as operating costs, which are passed on to consumers. Thus, the high number of transactions and the necessity to combat fraud stimulate the use of machine learning algorithms; among them, rule-based classifiers. However, a weakness of these classifiers is that, in practice, they are highly dependent on professionals who detect patterns of fraudulent transactions, transform them into rules and implement these rules in the classifier. Knowing this scenario, the aim of this thesis is to propose an architecture based on association rules and logistic regression - techniques studied in Machine Learning - for mining rules on data and produce rule sets to detect fraudulent transactions and make them available to experts. As a result, these professionals will have the aid of computers to discover the rules that support the classifier, decreasing the chance of having non-discovered fraudulent patterns and increasing the efficiency of generate and maintain these rules. In order to test the proposal, the experimental part of the thesis has used almost 7.7 million transactions provided by a real company. Moreover, after a long process of analysis of the database, 141 characteristics were combined using the algorithm FP-Growth, generating 38,003 rules. After a process of filtering and selection, they were grouped into five sets of rules which the biggest one has 1,285 rules. Each of the five sets was subjected to logistic regression, so their rules have been validated and weighted by statistical criteria. At the end of the process, the goodness of fit tests were satisfied and the performance indicators have shown very good classification powers (AUC between 0.788 and 0.820). In conclusion, the combined application of statistical techniques - cost sensitive learning, association rules and logistic regression - proved being conceptually and theoretically cohesive and coherent. Finally, the experiment and its results have demonstrated the technical and practical feasibilities of the proposal.
56

A Cloud Based Platform for Big Data Science

Islam, Md. Zahidul January 2014 (has links)
With the advent of cloud computing, resizable scalable infrastructures for data processing is now available to everyone. Software platforms and frameworks that support data intensive distributed applications such as Amazon Web Services and Apache Hadoop enable users to the necessary tools and infrastructure to work with thousands of scalable computers and process terabytes of data. However writing scalable applications that are run on top of these distributed frameworks is still a demanding and challenging task. The thesis aimed to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large data sets, collectively known as “big data”. The term “big-data” in this thesis refers to large, diverse, complex, longitudinal and/or distributed data sets generated from instruments, sensors, internet transactions, email, social networks, twitter streams, and/or all digital sources available today and in the future. We introduced architectures and concepts for implementing a cloud-based infrastructure for analyzing large volume of semi-structured and unstructured data. We built and evaluated an application prototype for collecting, organizing, processing, visualizing and analyzing data from the retail industry gathered from indoor navigation systems and social networks (Twitter, Facebook etc). Our finding was that developing large scale data analysis platform is often quite complex when there is an expectation that the processed data will grow continuously in future. The architecture varies depend on requirements. If we want to make a data warehouse and analyze the data afterwards (batch processing) the best choices will be Hadoop clusters and Pig or Hive. This architecture has been proven in Facebook and Yahoo for years. On the other hand, if the application involves real-time data analytics then the recommendation will be Hadoop clusters with Storm which has been successfully used in Twitter. After evaluating the developed prototype we introduced a new architecture which will be able to handle large scale batch and real-time data. We also proposed an upgrade of the existing prototype to handle real-time indoor navigation data.
57

An ontological approach for monitoring and surveillance systems in unregulated markets

Younis Zaki, Mohamed January 2013 (has links)
Ontologies are a key factor of Information management as they provide a common representation to any domain. Historically, finance domain has suffered from a lack of efficiency in managing vast amounts of financial data, a lack of communication and knowledge sharing between analysts. Particularly, with the growth of fraud in financial markets, cases are challenging, complex, and involve a huge volume of information. Gathering facts and evidence is often complex. Thus, the impetus for building a financial fraud ontology arises from the continuous improvement and development of financial market surveillance systems with high analytical capabilities to capture frauds which is essential to guarantee and preserve an efficient market.This thesis proposes an ontology-based approach for financial market surveillance systems. The proposed ontology acts as a semantic representation of mining concepts from unstructured resources and other internet sources (corpus). The ontology contains a comprehensive concept system that can act as a semantically rich knowledge base for a market monitoring system. This could help fraud analysts to understand financial fraud practices, assist open investigation by managing relevant facts gathered for case investigations, providing early detection techniques of fraudulent activities, developing prevention practices, and sharing manipulation patterns from prosecuted cases with investigators and relevant users. The usefulness of the ontology will be evaluated through three case studies, which not only help to explain how manipulation in markets works, but will also demonstrate how the ontology can be used as a framework for the extraction process and capturing information related to financial fraud, to improve the performance of surveillance systems in fraud monitoring. Given that most manipulation cases occur in the unregulated markets, this thesis uses a sample of fraud cases from the unregulated markets. On the empirical side, the thesis presents examples of novel applications of text-mining tools and data-processing components, developing off-line surveillance systems that are fully working prototypes which could train the ontology in the most recent manipulation techniques.
58

An offender’s perspective of what motivates, deters and prevents white collar crime in the South African workplace

Muto, Luigi 28 July 2012 (has links)
The aim of the research was to look at the motivations behind white-collar crime and, by means of the insights gained, allow businesses to achieve a better understanding of these motivations and the possible loopholes that exist with respect to white-collar crime. Empowered which such knowledge, businesses fraud mitigation polices and approaches are enhanced; which contribute towards sustained operations and increased shareholder value by reduce losses. Face-to-face interviews were held with 29 white-collar offenders imprisoned at the Johannesburg Medium Correctional Centre in Gauteng, South African. Data was collected from these interviews and grouped into themes that related to the research questions. An action plan was formulated to assist business in their fight to eliminate and reduce the impact of commercial crime. / Dissertation (MBA)--University of Pretoria, 2012. / Gordon Institute of Business Science (GIBS) / unrestricted
59

Anomaly Detection of Time Series Caused by International Revenue Share Fraud : Additive Model and Autoencoder Applications

Wang, Lingxiao January 2023 (has links)
In this paper, we compare the performance of two methods to find the attempts at fraud from the data provided by Sinch (formerly CLX Communications, which is a telecommunications and cloud communications platform as a service (PaaS) company). We consider the problem as finding the anomaly in a time series signal, where we ignore the duration of a single call or other features and only care about the total volume of calls in a certain period.\\ We compare Seasonal and Trend decomposition using Loess(STL) and auto-encoder-decoder under the scenario to find the anomaly in a certain period. It comes out that additive models like STL can discriminate the trending anomaly. As for auto-encoder-decoder, the anomaly can easily be found using local information, which makes the method conveniently applied. It remains a problem that unsupervised learning methods usually require manual inspection. In practical applications, we need to iterate many times with experts to find the most suitable method for that scenario. / I det här dokumentet jämför vi resultatet av två metoder för att hitta bedrägeriförsöken från data som tillhandahålls av Sinch (tidigare CLX Communications, som är ett telekommunikations- och molnkommunikations-plattform som en tjänst (PaaS)-företag). Vi betraktar problemet som att hitta anomalien i en tidsseriesignal, där vi ignorerar varaktigheten av ett enstaka samtal eller andra funktioner och tar bara hänsyn av den totala volymen samtal under en viss period. Vi jämför säsongs- och trenduppdelning med Loess(STL) och auto-encoder-decoder under scenariot för att hitta anomalien under en viss period. Det visar sig att additivmodeller som STL kan diskriminera trendavvikelsen. När det gäller auto-encoder-decoder, kan avvikelsen lätt hittas med hjälp av lokal information, vilket gör metoden Lämplig att tillämpa. Det är fortfarande ett problem att oövervakade inlärningsmetoder vanligtvis kräver manuell inspektion. I praktiska tillämpningar måste vi iterera många gånger med experter för att hitta den mest lämpliga metoden för det scenariot.
60

Federated Learning with FEDn for Financial Market Surveillance

Voltaire Edoh, Isak January 2022 (has links)
Machine Learning (ML) is the current trend that most industries opt for to improve their business and operations. ML has also been adopted in the financial markets, where well-funded financial institutions employ the latest ML algorithms to gain an advantage on the market. The darker side of ML is the potential emergence of complex algorithmic trading schemes that are abusive and manipulative. Because of this, it is inevitable that ML will be applied to financial market surveillance in order to detect these abusive and manipulative trading strategies. Ideally, an accurate ML detection model would be developed with data from many financial institutions or trading venues. However, such ML models require vast quantities of data, which poses a problem in market surveillance where data is sensitive or limited. Data sharing between companies or countries is typically accompanied by legal and privacy concerns. By training ML models on distributed datasets, Federated Learning (FL) overcomes these issues by eliminating the need to centralise sensitive data. This thesis aimed to address these ML related issues in market surveillance by implementing and evaluating a FL model. FL enables a group of independent data-holding clients with the same intention to build a shared ML model collaboratively without compromising private data. In this work, a ML model is initially deployed in a centralised data setting and trained to detect the manipulative trading scheme known as spoofing. The LSTM-Autoencoder was the model chosen method for this task. The same model is also implemented in a federated setting but with decentralised data, using the FL framework FEDn. Another FL framework, Flower, is also employed to evaluate the performance of FEDn. Experiments were conducted comparing the FL models to the conventional centralised learning model, as well as comparing the two frameworks to each other. The results showed that under certain circumstances, the FL models performed better than the centralised model in detecting spoofing. FEDn was equivalent to Flower in terms of detection performance. In addition, the results indicated that Flower was marginally faster than FEDn. It is assumed that variations in the experimental setup and stochasticity account for the performance disparity.

Page generated in 0.0801 seconds