Global ETD Search

41	Generation of Software Test Data from the Design Specification Using Heuristic Techniques. Exploring the UML State Machine Diagrams and GA Based Heuristic Techniques in the Automated Generation of Software Test Data and Test Code. Doungsa-ard, Chartchai January 2011 (has links) Software testing is a tedious and very expensive undertaking. Automatic test data generation is, therefore, proposed in this research to help testers reduce their work as well as ascertain software quality. The concept of test driven development (TDD) has become increasingly popular during the past several years. According to TDD, test data should be prepared before the beginning of code implementation. Therefore, this research asserts that the test data should be generated from the software design documents which are normally created prior to software code implementation. Among such design documents, the UML state machine diagrams are selected as a platform for the proposed automated test data generation mechanism. Such diagrams are selected because they show behaviours of a single object in the system. The genetic algorithm (GA) based approach has been developed and applied in the process of searching for the right amount of quality test data. Finally, the generated test data have been used together with UML class diagrams for JUnit test code generation. The GA-based test data generation methods have been enhanced to take care of parallel path and loop problems of the UML state machines. In addition the proposed GA-based approach is also targeted to solve the diagrams with parameterised triggers. As a result, the proposed framework generates test data from the basic state machine diagram and the basic class diagram without any additional nonstandard information, while most other approaches require additional information or the generation of test data from other formal languages. The transition coverage values for the introduced approach here are also high; therefore, the generated test data can cover most of the behaviour of the system. / EU Asia-Link project TH/Asia Link/004(91712) East-West and CAMT Software testing Heuristic techniques Software test data Software test code Automatic test data generation Software quality Genetic algorithm (GA) approach
42	Bayesian Variable Selection with Shrinkage Priors and Generative Adversarial Networks for Fraud Detection Issoufou Anaroua, Amina 01 January 2024 (has links) (PDF) This research paper focuses on fraud detection in the financial industry using Generative Adversarial Networks (GANs) in conjunction with Uni and Multi Variate Bayesian Model with Shrinkage Priors (BMSP). The problem addressed is the need for accurate and advanced fraud detection techniques due to the increasing sophistication of fraudulent activities. The methodology involves the implementation of GANs and the application of BMSP for variable selection to generate synthetic fraud samples for fraud detection using the augmented dataset. Experimental results demonstrate the effectiveness of the BMSP GAN approach in detecting fraud with improved performance compared to other methods. The conclusions drawn highlight the potential of GANs and BMSP for enhancing fraud detection capabilities and suggest future research directions for further improvements in the field. Categorical Data Analysis Data Science
43	IMPROVING THE UTILITY OF DIFFERENTIALLY PRIVATE ALGORITHMS USING DATA CHARACTERISTICS Farzad Zafarani (11837222) 10 January 2025 (has links) <p dir="ltr">As data continues to grow rapidly in volume and complexity, there is an increasing need to extract meaningful insights from it. These datasets often contain sensitive individual information, making privacy protection crucial. Differential privacy has become the de facto standard for protecting individuals' privacy. Many datasets also have known constraints and structures. Can these known constraints or structures be leveraged to design mechanisms with better utility?</p><p dir="ltr">The focus of this thesis is to demonstrate that by leveraging the inherent structures and constraints within datasets, it may be possible to design differential privacy mechanisms that offer better utility (i.e., more accurate results) while maintaining the required level of privacy. This involves exploring advanced techniques and modifications to the basic mechanisms that take advantage of dataset-specific properties, such as sparsity, distributional assumptions, or other contextual information. This approach aims to minimize the noise added, thereby improving the utility of differentially private outputs.</p><p dir="ltr">In many scenarios, datasets contain constraints. In this thesis, we show that generating differentially private synthetic data while preserving constraints increases utility across several metrics, including marginal queries, classification task accuracy, and clustering. Smooth sensitivity is a data-dependent sensitivity metric that allows for more precise noise addition based on the actual data distribution, rather than worst-case scenarios. It addresses the limitations of local sensitivity by ensuring robust privacy guarantees, even in the presence of outliers or small changes in the data.</p><p dir="ltr"><br></p><p dir="ltr">We have developed a differentially private Naive Bayes model using smooth sensitivity. By using data-dependent sensitivity measures like smooth sensitivity and incorporating known data constraints, we can reduce the amount of noise added, resulting in a more accurate model.</p> Data and information privacy Naive Bayes Classifier Differential Privacy Privacy Synthetic Data Generation Smooth Sensitivity
44	Generation of software test data from the design specification using heuristic techniques : exploring the UML state machine diagrams and GA based heuristic techniques in the automated generation of software test data and test code Doungsa-ard, Chartchai January 2011 (has links) Software testing is a tedious and very expensive undertaking. Automatic test data generation is, therefore, proposed in this research to help testers reduce their work as well as ascertain software quality. The concept of test driven development (TDD) has become increasingly popular during the past several years. According to TDD, test data should be prepared before the beginning of code implementation. Therefore, this research asserts that the test data should be generated from the software design documents which are normally created prior to software code implementation. Among such design documents, the UML state machine diagrams are selected as a platform for the proposed automated test data generation mechanism. Such diagrams are selected because they show behaviours of a single object in the system. The genetic algorithm (GA) based approach has been developed and applied in the process of searching for the right amount of quality test data. Finally, the generated test data have been used together with UML class diagrams for JUnit test code generation. The GA-based test data generation methods have been enhanced to take care of parallel path and loop problems of the UML state machines. In addition the proposed GA-based approach is also targeted to solve the diagrams with parameterised triggers. As a result, the proposed framework generates test data from the basic state machine diagram and the basic class diagram without any additional nonstandard information, while most other approaches require additional information or the generation of test data from other formal languages. The transition coverage values for the introduced approach here are also high; therefore, the generated test data can cover most of the behaviour of the system. 005.3
45	Development of artificial intelligence-based in-silico toxicity models : data quality analysis and model performance enhancement through data generation Malazizi, Ladan January 2008 (has links) Toxic compounds, such as pesticides, are routinely tested against a range of aquatic, avian and mammalian species as part of the registration process. The need for reducing dependence on animal testing has led to an increasing interest in alternative methods such as in silico modelling. The QSAR (Quantitative Structure Activity Relationship)-based models are already in use for predicting physicochemical properties, environmental fate, eco-toxicological effects, and specific biological endpoints for a wide range of chemicals. Data plays an important role in modelling QSARs and also in result analysis for toxicity testing processes. This research addresses number of issues in predictive toxicology. One issue is the problem of data quality. Although large amount of toxicity data is available from online sources, this data may contain some unreliable samples and may be defined as of low quality. Its presentation also might not be consistent throughout different sources and that makes the access, interpretation and comparison of the information difficult. To address this issue we started with detailed investigation and experimental work on DEMETRA data. The DEMETRA datasets have been produced by the EC-funded project DEMETRA. Based on the investigation, experiments and the results obtained, the author identified a number of data quality criteria in order to provide a solution for data evaluation in toxicology domain. An algorithm has also been proposed to assess data quality before modelling. Another issue considered in the thesis was the missing values in datasets for toxicology domain. Least Square Method for a paired dataset and Serial Correlation for single version dataset provided the solution for the problem in two different situations. A procedural algorithm using these two methods has been proposed in order to overcome the problem of missing values. Another issue we paid attention to in this thesis was modelling of multi-class data sets in which the severe imbalance class samples distribution exists. The imbalanced data affect the performance of classifiers during the classification process. We have shown that as long as we understand how class members are constructed in dimensional space in each cluster we can reform the distribution and provide more knowledge domain for the classifier. 615.9
46	Uma abordagem para geração de dados de teste para o teste de mutação utilizando técnicas baseadas em busca / An approach for test data generation in mutation testing using seacrh-based techniques Souza, Francisco Carlos Monteiro 24 May 2017 (has links) O teste de mutação é um critério de teste poderoso para detectar falhas e medir a eficácia de um conjunto de dados de teste. No entanto, é uma técnica de teste computacionalmente cara. O alto custo provém principalmente do esforço para gerar dados de teste adequados para matar os mutantes e pela existência de mutantes equivalentes. Nesse contexto, o objetivo desta tese é apresentar uma abordagem chamada de Reach, Infect and Propagation to Mutation Testing (RIPMuT) que visa gerar dados de teste e sugerir mutantes equivalentes. A abordagem é composta por dois módulos: (i) uma geração automatizada de dados de teste usando subida da encosta e um esquema de fitness de acordo com as condições de alcançabilidade, infeção e propagação (RIP); e (ii) um método para sugerir mutantes equivalentes com base na análise das condições RIP durante o processo de geração de dados de teste. Os experimentos foram conduzidos para avaliar a eficácia da abordagem RIP-MuT e um estudo comparativo com o algoritmo genético e testes aleatórios foi realizado. A abordagem RIP-MuT obteve um escore médio de mutação de 18,25 % maior que o AG e 35,93 % maior que o teste aleatório. O método proposto para detecção de mutantes equivalentes se mostrou viável para redução de custos relacionado a essa atividade, uma vez que obteve uma precisão de 75,05% na sugestão dos mutantes equivalentes. Portanto, os resultados indicam que a abordagem gera dados de teste adequados capazes de matar a maioria dos mutantes em programas C e, também auxilia a identificar mutantes equivalentes corretamente. / Mutation Testing is a powerful test criterion to detect faults and measure the effectiveness of a test data set. However, it is a computationally expensive testing technique. The high cost comes mainly from the effort to generate adequate test data to kill the mutants and by the existence of equivalent mutants. In this thesis, an approach called Reach, Infect and Propagation to Mutation Testing (RIP-MuT) is presented to generate test data and to suggest equivalent mutants. The approach is composed of two modules: (i) an automated test data generation using hill climbing and a fitness scheme according to Reach, Infect, and Propagate (RIP) conditions; and (ii) a method to suggest equivalent mutants based on the analyses of RIP conditions during the process of test data generation. The experiments were conducted to evaluate the effectiveness of the RIP-MuT approach and a comparative study with a genetic algorithm and random testing. The RIP-MuT approach achieved a mean mutation score of 18.25% higher than the GA and 35.93% higher than random testing. The proposed method for detection of equivalent mutants demonstrate to be feasible for cost reduction in this activity since it obtained a precision of 75.05% on suggesting equivalent mutants. Therefore, the results indicate that the approach produces effective test data able to strongly kill the majority of mutants on C programs, and also it can assist in suggesting equivalent mutants correctly. Mutation testing Search-based software testing Software testing Test data generation Teste de mutação Teste de software
47	Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos / Selection between whole test generation strategies by analysing object oriented software static metrics Ramos, Gustavo da Mota 09 October 2018 (has links) Produtos de software com diferentes complexidades são criados diariamente através da elicitação de demandas complexas e variadas juntamente a prazos restritos. Enquanto estes surgem, altos níveis de qualidade são esperados para tais, ou seja, enquanto os produtos tornam-se mais complexos, o nível de qualidade pode não ser aceitável enquanto o tempo hábil para testes não acompanha a complexidade. Desta maneira, o teste de software e a geração automática de dados de testes surgem com o intuito de entregar produtos contendo altos níveis de qualidade mediante baixos custos e rápidas atividades de teste. Porém, neste contexto, os profissionais de desenvolvimento dependem das estratégias de geração automáticas de testes e principalmente da seleção da técnica mais adequada para conseguir maior cobertura de código possível, este é um fator importante dados que cada técnica de geração de dados de teste possui particularidades e problemas que fazem seu uso melhor em determinados tipos de software. A partir desde cenário, o presente trabalho propõe a seleção da técnica adequada para cada classe de um software com base em suas características, expressas por meio de métricas de softwares orientados a objetos a partir do algoritmo de classificação Naive Bayes. Foi realizada uma revisão bibliográfica de dois algoritmos de geração, algoritmo de busca aleatório e algoritmo de busca genético, compreendendo assim suas vantagens e desvantagens tanto de implementação como de execução. As métricas CK também foram estudadas com o intuito de compreender como estas podem descrever melhor as características de uma classe. O conhecimento adquirido possibilitou coletar os dados de geração de testes de cada classe como cobertura de código e tempo de geração a partir de cada técnica e também as métricas CK, permitindo assim a análise destes dados em conjunto e por fim execução do algoritmo de classificação. Os resultados desta análise demonstraram que um conjunto reduzido e selecionado das métricas CK é mais eficiente e descreve melhor as características de uma classe se comparado ao uso do conjunto por completo. Os resultados apontam também que as métricas CK não influenciam o tempo de geração dos dados de teste, entretanto, as métricas CK demonstraram correlação moderada e influência na seleção do algoritmo genético, participando assim na sua seleção pelo algoritmo Naive Bayes / Software products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes Algoritmo genético Cobertura de testes Code coverages Genetic algorithm Geração de testes Métricas CK Naive bayes Naive bayes Software testing Test data generation Teste de software
48	Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos / Selection between whole test generation strategies by analysing object oriented software static metrics Gustavo da Mota Ramos 09 October 2018 (has links) Produtos de software com diferentes complexidades são criados diariamente através da elicitação de demandas complexas e variadas juntamente a prazos restritos. Enquanto estes surgem, altos níveis de qualidade são esperados para tais, ou seja, enquanto os produtos tornam-se mais complexos, o nível de qualidade pode não ser aceitável enquanto o tempo hábil para testes não acompanha a complexidade. Desta maneira, o teste de software e a geração automática de dados de testes surgem com o intuito de entregar produtos contendo altos níveis de qualidade mediante baixos custos e rápidas atividades de teste. Porém, neste contexto, os profissionais de desenvolvimento dependem das estratégias de geração automáticas de testes e principalmente da seleção da técnica mais adequada para conseguir maior cobertura de código possível, este é um fator importante dados que cada técnica de geração de dados de teste possui particularidades e problemas que fazem seu uso melhor em determinados tipos de software. A partir desde cenário, o presente trabalho propõe a seleção da técnica adequada para cada classe de um software com base em suas características, expressas por meio de métricas de softwares orientados a objetos a partir do algoritmo de classificação Naive Bayes. Foi realizada uma revisão bibliográfica de dois algoritmos de geração, algoritmo de busca aleatório e algoritmo de busca genético, compreendendo assim suas vantagens e desvantagens tanto de implementação como de execução. As métricas CK também foram estudadas com o intuito de compreender como estas podem descrever melhor as características de uma classe. O conhecimento adquirido possibilitou coletar os dados de geração de testes de cada classe como cobertura de código e tempo de geração a partir de cada técnica e também as métricas CK, permitindo assim a análise destes dados em conjunto e por fim execução do algoritmo de classificação. Os resultados desta análise demonstraram que um conjunto reduzido e selecionado das métricas CK é mais eficiente e descreve melhor as características de uma classe se comparado ao uso do conjunto por completo. Os resultados apontam também que as métricas CK não influenciam o tempo de geração dos dados de teste, entretanto, as métricas CK demonstraram correlação moderada e influência na seleção do algoritmo genético, participando assim na sua seleção pelo algoritmo Naive Bayes / Software products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes Algoritmo genético Cobertura de testes Geração de testes Métricas CK Naive bayes Teste de software Code coverages Genetic algorithm Naive bayes Software testing Test data generation
49	Uma abordagem para geração de dados de teste para o teste de mutação utilizando técnicas baseadas em busca / An approach for test data generation in mutation testing using seacrh-based techniques Francisco Carlos Monteiro Souza 24 May 2017 (has links) O teste de mutação é um critério de teste poderoso para detectar falhas e medir a eficácia de um conjunto de dados de teste. No entanto, é uma técnica de teste computacionalmente cara. O alto custo provém principalmente do esforço para gerar dados de teste adequados para matar os mutantes e pela existência de mutantes equivalentes. Nesse contexto, o objetivo desta tese é apresentar uma abordagem chamada de Reach, Infect and Propagation to Mutation Testing (RIPMuT) que visa gerar dados de teste e sugerir mutantes equivalentes. A abordagem é composta por dois módulos: (i) uma geração automatizada de dados de teste usando subida da encosta e um esquema de fitness de acordo com as condições de alcançabilidade, infeção e propagação (RIP); e (ii) um método para sugerir mutantes equivalentes com base na análise das condições RIP durante o processo de geração de dados de teste. Os experimentos foram conduzidos para avaliar a eficácia da abordagem RIP-MuT e um estudo comparativo com o algoritmo genético e testes aleatórios foi realizado. A abordagem RIP-MuT obteve um escore médio de mutação de 18,25 % maior que o AG e 35,93 % maior que o teste aleatório. O método proposto para detecção de mutantes equivalentes se mostrou viável para redução de custos relacionado a essa atividade, uma vez que obteve uma precisão de 75,05% na sugestão dos mutantes equivalentes. Portanto, os resultados indicam que a abordagem gera dados de teste adequados capazes de matar a maioria dos mutantes em programas C e, também auxilia a identificar mutantes equivalentes corretamente. / Mutation Testing is a powerful test criterion to detect faults and measure the effectiveness of a test data set. However, it is a computationally expensive testing technique. The high cost comes mainly from the effort to generate adequate test data to kill the mutants and by the existence of equivalent mutants. In this thesis, an approach called Reach, Infect and Propagation to Mutation Testing (RIP-MuT) is presented to generate test data and to suggest equivalent mutants. The approach is composed of two modules: (i) an automated test data generation using hill climbing and a fitness scheme according to Reach, Infect, and Propagate (RIP) conditions; and (ii) a method to suggest equivalent mutants based on the analyses of RIP conditions during the process of test data generation. The experiments were conducted to evaluate the effectiveness of the RIP-MuT approach and a comparative study with a genetic algorithm and random testing. The RIP-MuT approach achieved a mean mutation score of 18.25% higher than the GA and 35.93% higher than random testing. The proposed method for detection of equivalent mutants demonstrate to be feasible for cost reduction in this activity since it obtained a precision of 75.05% on suggesting equivalent mutants. Therefore, the results indicate that the approach produces effective test data able to strongly kill the majority of mutants on C programs, and also it can assist in suggesting equivalent mutants correctly. Teste de mutação Teste de software Mutation testing Search-based software testing Software testing Test data generation
50	Generation of Synthetic Data with Generative Adversarial Networks Garcia Torres, Douglas January 2018 (has links) The aim of synthetic data generation is to provide data that is not real for cases where the use of real data is somehow limited. For example, when there is a need for larger volumes of data, when the data is sensitive to use, or simply when it is hard to get access to the real data. Traditional methods of synthetic data generation use techniques that do not intend to replicate important statistical properties of the original data. Properties such as the distribution, the patterns or the correlation between variables, are often omitted. Moreover, most of the existing tools and approaches require a great deal of user-defined rules and do not make use of advanced techniques like Machine Learning or Deep Learning. While Machine Learning is an innovative area of Artificial Intelligence and Computer Science that uses statistical techniques to give computers the ability to learn from data, Deep Learning is a closely related field based on learning data representations, which may serve useful for the task of synthetic data generation. This thesis focuses on one of the most interesting and promising innovations of the last years in the Machine Learning community: Generative Adversarial Networks. An approach for generating discrete, continuous or text synthetic data with Generative Adversarial Networks is proposed, tested, evaluated and compared with a baseline approach. The results prove the feasibility and show the advantages and disadvantages of using this framework. Despite its high demand for computational resources, a Generative Adversarial Networks framework is capable of generating quality synthetic data that preserves the statistical properties of a given dataset. / Syftet med syntetisk datagenerering är att tillhandahålla data som inte är verkliga i fall där användningen av reella data på något sätt är begränsad. Till exempel, när det finns behov av större datamängder, när data är känsliga för användning, eller helt enkelt när det är svårt att få tillgång till den verkliga data. Traditionella metoder för syntetiska datagenererande använder tekniker som inte avser att replikera viktiga statistiska egenskaper hos de ursprungliga data. Egenskaper som fördelningen, mönstren eller korrelationen mellan variabler utelämnas ofta. Dessutom kräver de flesta av de befintliga verktygen och metoderna en hel del användardefinierade regler och använder inte avancerade tekniker som Machine Learning eller Deep Learning. Machine Learning är ett innovativt område för artificiell intelligens och datavetenskap som använder statistiska tekniker för att ge datorer möjlighet att lära av data. Deep Learning ett närbesläktat fält baserat på inlärningsdatapresentationer, vilket kan vara användbart för att generera syntetisk data. Denna avhandling fokuserar på en av de mest intressanta och lovande innovationerna från de senaste åren i Machine Learning-samhället: Generative Adversarial Networks. Generative Adversarial Networks är ett tillvägagångssätt för att generera diskret, kontinuerlig eller textsyntetisk data som föreslås, testas, utvärderas och jämförs med en baslinjemetod. Resultaten visar genomförbarheten och visar fördelarna och nackdelarna med att använda denna metod. Trots dess stora efterfrågan på beräkningsresurser kan ett generativt adversarialnätverk skapa generell syntetisk data som bevarar de statistiska egenskaperna hos ett visst dataset. Synthetic Data Generation Generative Adversarial Networks Machine Learning Deep Learning Neural Networks Syntetisk Datagenerering Generativa Adversariella Nätverk Maskin-lärande Djupt Lärande Neurala Nätverk. Computer and Information Sciences Data- och informationsvetenskap

Search results