11 |
Quantitative Analysis of Exploration Schedules for Symbolic Execution / Kvantitativ analys av utforskningsscheman för Symbolisk ExekveringKaiser, Christoph January 2017 (has links)
Due to complexity in software, manual testing is not enough to cover all relevant behaviours of it. A different approach to this problem is Symbolic Execution. Symbolic Execution is a software testing technique that tests all possible inputs of a program in the hopes of finding all bugs. Due to the often exponential increase in possible program paths, Symbolic Execution usually cannot exhaustively test a program. To nevertheless cover the most important or error prone areas of a program, search strategies that prioritize these areas are used. Such search strategies navigate the program execution tree, analysing which paths seem interesting enough to execute and which to prune. These strategies are typically grouped into two categories, general purpose searchers, with no specific target but the aim to cover the whole program and targeted searchers which can be directed towards specific areas of interest. To analyse how different searching strategies in Symbolic Execution affect the finding of errors and how they can be combined to improve the general outcome, the experiments conducted consist of several different searchers and combinations of them, each run on the same set of test targets. This set of test targets contains amongst others one of the most heavily tested sets of open source tools, the GNU Coreutils. With these, the different strategies are compared in distinct categories like the total number of errors found or the percentage of covered code. With the results from this thesis the potential of targeted searchers is shown, with an example implementation of the Pathscore-Relevance strategy. Further, the results obtained from the conducted experiments endorse the use of combinations of search strategies. It is also shown that, even simple combinations of strategies can be highly effective. For example, interleaving strategies can provide good results even if the underlying searchers might not perform well by themselves. / På grund av programvarukomplexitet är manuell testning inte tillräcklig för att täcka alla relevanta beteenden av programvaror. Ett annat tillvägagångssätt till detta problem är Symbolisk Exekvering (Symbolic Execution). Symbolisk Exekvering är en mjukvarutestningsteknik som testar alla möjliga inmatningari ett program i hopp om att hitta alla buggar. På grund av den ofta exponentiella ökningeni möjliga programsökvägar kan Symbolisk Exekvering vanligen inte uttömmande testa ettprogram. För att ändå täcka de viktigaste eller felbenägna områdena i ett program, används sökstrategier som prioriterar dessa områden. Sådana sökstrategier navigerar i programexekveringsträdet genom att analysera vilka sökvägar som verkar intressanta nog att utföra och vilka att beskära. Dessa strategier grupperas vanligtvis i två kategorier, sökare med allmänt syfte, utan något specifikt mål förutom att täcka hela programmet, och riktade sökare som kan riktas mot specifika intresseområden. För att analysera hur olika sökstrategier i Symbolisk Exekvering påverkar upptäckandetav fel och hur de kan kombineras för att förbättra det allmänna utfallet, bestod de experimentsom utfördes av flera olika sökare och kombinationer av dem, som alla kördes på samma uppsättning av testmål. Denna uppsättning av testmål innehöll bland annat en av de mest testade uppsättningarna av öppen källkod-verktyg, GNU Coreutils. Med dessa jämfördes de olika strategierna i distinkta kategorier såsom det totala antalet fel som hittats eller procenttalet av täckt kod. Med resultaten från denna avhandling visas potentialen hos riktade sökare, med ett exempeli form av implementeringen av Pathscore-Relevance strategin. Vidare stöder resultaten som erhållits från de utförda experimenten användningen av sökstrategikombinationer. Det visas också att även enkla kombinationer av strategier kan vara mycket effektiva.Interleaving-strategier kan till exempel ge bra resultat även om de underliggande sökarna kanske inte fungerar bra själva.
|
12 |
Usage of Generative AI Based Plugin in Unit Testing : Evaluating the Trustworthiness of Generated Test Cases by Codiumate, an IDE Plugin Powered by GPT-3.5 & 4Nazari, Ali Reza, Nannicha Thunell, Bow January 2024 (has links)
Background: Unit testing is essential in software development, ensuring the functionality of individual components like functions and classes. However, manual creation of unit test cases is time-consuming and tedious, impacting testing efficiency and reliability. Problem: Automated unit test generation tools such as EvoSuite and Randoop have addressed some challenges, but they’re limited by language specificity and predefined algorithms. Generative AI tools like ChatGPT and GitHub Copilot powered by OpenAI’sGPT-3.5/4 offer alternatives, but face limitations like user input reliance and operational inconveniences. Solution: CodiumAI’s Codiumate IDE plugin aims to mitigate these limitations, making code quality assurance easier for developers. This study evaluates Codiumate’s trustworthiness in generating unit tests for the Python functions. Method: We randomly selected thirty functions from OpenAI’s HumanEval dataset, and wrote selection criteria for relevant test cases based on each function’s doc string to evaluate Codiumate’s trustworthiness using metrics such as Relevance Score, false positive rate, and result consistency rate. Result: Among all the suggested test cases by Codiumate, 208 unit tests, which consists of 48% of suggested test cases that were relevant. 70% of assertions from these test cases strictly meet selection criteria, while the other 30% while relevant were selected due to our basis and experience in software testing. The average false positive rate is15%. Function groups that have higher Relevance Scores are non-mathematical nature, and simple dependencies. High false positives arise in functions with string and float parameters. All generated unit tests are syntax-error-free, with 20% fail and 80% passed in all five test execution. Conclusion: Codiumate demonstrates potential in automating unit test generation, offering a convenient means to support developers. However, it is not yet fully reliable for critical applications without developer oversight. Continued refinement and exploration of its capabilities are essential for Codiumate to become an indispensable asset in unit test generation, enhancing its trustworthiness and effectiveness in the software development process.
|
13 |
Model-Based Test Case Generation for Real-Time SystemsHessel, Anders January 2007 (has links)
Testing is the dominant verification technique used in the software industry today. The use of automatic test case execution increases, but the creation of test cases remains manual and thus error prone and expensive. To automate generation and selection of test cases, model-based testing techniques have been suggested. In this thesis two central problems in model-based testing are addressed: the problem of how to formally specify coverage criteria, and the problem of how to generate a test suite from a formal timed system model, such that the test suite satisfies a given coverage criterion. We use model checking techniques to explore the state-space of a model until a set of traces is found that together satisfy the coverage criterion. A key observation is that a coverage criterion can be viewed as consisting of a set of items, which we call coverage items. Each coverage item can be treated as a separate reachability problem. Based on our view of coverage items we define a language, in the form of parameterized observer automata, to formally describe coverage criteria. We show that the language is expressive enough to describe a variety of common coverage criteria described in the literature. Two algorithms for test case generation with observer automata are presented. The first algorithm returns a trace that satisfies all coverage items with a minimum cost. We use this algorithm to generate a test suite with minimal execution time. The second algorithm explores only states that may increase the already found set of coverage items. This algorithm works well together with observer automata. The developed techniques have been implemented in the tool CoVer. The tool has been used in a case study together with Ericsson where a WAP gateway has been tested. The case study shows that the techniques have industrial strength.
|
14 |
Elaboration d'une approche de vérification et de validation de logiciel embarqué automobile, basée sur la génération automatique de cas de test / Elaboration of an approach of check and validation of automobile embarked software, based on the automatic generation of case of testKangoye, Sékou 27 June 2016 (has links)
Un système embarqué est un système électronique et informatique autonome dédié à une tâche précise. Dans le secteur de l’automobile, le nombre de systèmes embarqués dans les voitures a considérablement augmenté au cours de ces dernières années et va certainement continuer à augmenter. Ces systèmes sont dédiés entre autres, à la sécurité, au confort de conduite,et à l’assistance à la conduite. Cette croissance des systèmes est associée avec une croissance en taille des logiciels qui les contrôlent. En conséquence, leur gestion(système et logiciel) devient de plus en plus complexe et problématique. Par ailleurs, la concurrence dans le secteur automobile est très féroce et les temps de mise sur le marché sont de plus en plus courts. Ainsi, pour garantir le bon fonctionnement des systèmes en général et du logiciel en particulier, étant donné leur complexité,et aussi les délais courts de mise sur le marché des produits automobiles, de nouvelles méthodes de développement doivent être considérées. Ainsi, de nombreuses méthodes de développement, incluant de nouveaux standards (de développement) et approches automatiques ont émergé au cours de ces dernières années. Dans le cas particulier de la vérification et validation de logiciel, une des activités critiques qui a connu une avancée significative est la génération de cas de test, avec l’avènement d’approches automatiques.Malgré cela, ces approches peinent souvent à s’imposer en milieu industriel. Une des raisons est que celles ci sont souvent peu adaptées ou peu utilisées dans un contexte industriel.Dans ce contexte, cette thèse vise à proposer une approche de vérification et de validation de logiciels embarqués, basée sur la génération automatique de cas de test. Pour cela, nous avons mis en place une approche permettant de représenter sous forme de modèles abstraits les spécifications d’un logiciel, puis de générer à partir de ces modèles un ensemble de cas de test en considérant en particulier le critère de couverture MC/DC. / An embedded system is a system that performs a specific task and has a computer embedded inside. In the automotive sector, the amount of embedded systems in the vehicle has risen dramatically in recent years and is set to increase. They deal essentially with safety, comfort, and driving assistance. Furthermore, the increase in number and complexity of the systems is associated with a growth in software. As a consequence, their management (system and software) have become more and more complex and problematic. Also, the competition and time-to-market in the automotive industry are very tough. Thus, to guarantee the efficiency and reliability of the embedded systems in the vehicle in general and the software in particular, in view of the complexity as well as the competition and time-to-market law, new development methods should be considered. Therefore, new development methods including new standards, and automatic approaches have emerged over the last years. In the particular case of embedded software verification and validation, one of the most critical activities that has experienced a significant progress is test case generation with the advent of automatic approaches. Despite this, these approaches are not widely used or are not well adapted in industrial context. In that context, our goal in this PhD. thesis is to propose a new verification and validation approach, based on automatic test case generation of embedded embedded. Thus, we have set up an approach that automatically generates test cases, with respect to the MC/DC criterion, from abstract models of the software specifications expressed in the form of state-transition models.
|
15 |
Testabilité des services Web / Web services testabilityRabhi, Issam 09 January 2012 (has links)
Cette thèse s’est attaquée sous diverses formes au test automatique des services Web : une première partie est consacrée au test fonctionnel à travers le test de robustesse. La seconde partie étend les travaux précédents pour le test de propriétés non fonctionnelles, telles que les propriétés de testabilité et de sécurité. Nous avons abordé ces problématiques à la fois d’un point de vue théorique et pratique. Nous avons pour cela proposé une nouvelle méthode de test automatique de robustesse des services Web non composés, à savoir les services Web persistants (stateful) et ceux non persistants. Cette méthode consiste à évaluer la robustesse d’un service Web par rapport aux opérations déclarées dans sa description WSDL, en examinant les réponses reçues lorsque ces opérations sont invoquées avec des aléas et en prenant en compte l’environnement SOAP. Les services Web persistants sont modélisés grâce aux systèmes symboliques. Notre méthode de test de robustesse dédiée aux services Web persistants consiste à compléter la spécification du service Web afin de décrire l’ensemble des comportements corrects et incorrects. Puis, en utilisant cette spécification complétée, les services Web sont testés en y intégrant des aléas. Un verdict est ensuite rendu. Nous avons aussi réalisé une étude sur la testabilité des services Web composés avec le langage BPEL. Nous avons décrit précisément les problèmes liés à l’observabilité qui réduisent la faisabilité du test de services Web. Par conséquent, nous avons évalué des facteurs de la testabilité et proposé des solutions afin d’améliorer cette dernière. Pour cela, nous avons proposé une approche permettant, en premier lieu, de transformer la spécification ABPEL en STS. Cette transformation consiste à convertir successivement et de façon récursive chaque activité structurée en un graphe de sous-activités. Ensuite, nous avons proposé des algorithmes d’améliorations permettant de réduire ces problèmes de testabilité. Finalement, nous avons présenté une méthode de test de sécurité des services Web persistants. Cette dernière consiste à évaluer quelques propriétés de sécurité, tel que l’authentification, l’autorisation et la disponibilité, grâce à un ensemble de règles. Ces règles ont été crée, avec le langage formel Nomad. Cette méthodologie de test consiste d’abord à transformer ces règles en objectifs de test en se basant sur la description WSDL, ensuite à compléter, en parallèle, la spécification du service Web persistant et enfin à effectuer le produit synchronisé afin de générer les cas de test. / This PhD thesis focuses on diverse forms of automated Web services testing : on the one hand, is dedicated to functional testing through robustness testing. On the other hand, is extends previous works on the non-functional properties testing, such as the testability and security properties. We have been exploring these issues both from a theoretical and practical perspective. We proposed a robustness testing method which generates and executes test cases automatically from WSDL descriptions. We analyze the Web service over hazards to find those which may be used for testing. We show that few hazards can be really handled and then we improve the robustness issue detection by separating the SOAP processor behavior from the Web service one. Stateful Web services are modeled with Symbolic Systems. A second method dedicated to stateful Web services consists in completing the Web service specification to describe correct and incorrect behaviors. By using this completed specification, the Web services are tested with relevant hazards and a verdict is returned. We study the BPEL testability on a well-known testability criterion called observability. To evaluate, we have chosen to transform ABPEL specifications into STS to apply existing methods. Then, from STS testability issues, we deduce some patterns of ABPEL testability degradation. These latter help to finally propose testability enhancement methods of ABPEL specifications. Finally, we proposed a security testing method for stateful Web Services. We define some specific security rules with the Nomad language. Afterwards, we construct test cases from a symbolic specification and test purposes derived from the previous rules. Moreover, to validate our proposal, we have applied our testing approach on real size case studies.
|
16 |
Proof-of-concept of Model-based testing based on an UML-model of a water-level measurement systemAlshekhly, Zoubida, Gill, Namra January 2020 (has links)
Software testing is a very important phase in software development as it minimize risks ina software system, however, it consumes time and can be very expensive. With automatictest case generation time consumption and cost can be reduced. Model-based testing isa method to test a software system with a model of the systems behaviour. Automatictest case generation is often considered a favorable support in model-based testing. In thiswork, the concept of model-based testing is explored along with testing the embedded partof a water-level measurement system (WLM) to investigate the efficiency of model-basedtesting on a software system. As a result of this, a model-based testing tool, MoMut::UMLis used to generate the test-cases on the UML model of WLM system that is built ina UML modeling environment, Eclipse-Papyrus. However, MoMut::UML implements aspecial type of model-based testing technique, model-based mutation testing; that injectsfaults in the UML model, and generates test-data on the fault-based model. By this, thebehaviour of system-under-test, only the UML model of water-level measurement system,is tested.
|
17 |
Empirical Comparison Between Conventional and AI-based Automated Unit Test Generation Tools in JavaGkikopouli, Marios, Bataa, Batjigdrel January 2023 (has links)
Unit testing plays a crucial role in ensuring the quality and reliability of software systems. However, manual testing can often be a slow and time-consuming process. With current advancements in artificial intelligence (AI), new tools have emerged for automated unit testing to address this issue. But how do these new AI tools compare to conventional automated unit test generation tools? To answer this question, we compared two state-of-the-art conventional unit test tools (EVOSUITE and RANDOOP) with the sole commercially available AI-based unit test tool (DIFFBLUE COVER) for Java. We tested them on 10 sample classes from 3 real-life projects provided by the Defects4J dataset to evaluate their performance regarding code coverage, mutation score, and fault detection. The results showed that EVOSUITE achieved the highest code coverage, averaging 89%, while RANDOOP and DIFFBLUE COVER achieved similar results, averaging 63%. In terms of mutation score, DIFFBLUE COVER had the lowest average score of 40%, while EVOSUITE and RANDOOP scored 67% and 50%, respectively. For fault detection, EVOSUITE and RANDOOP detected a higher number of bugs (7 out of 10 and 5 out of 10, respectively) compared to DIFFBLUE COVER, which found only 4 out of 10. Although the AI-based tool was outperformed in all three criteria, it still shows promise by being able to achieve adequate results, in some cases even surpassing the conventional tools while generating a significantly smaller number of total assertions and more comprehensive tests. Nonetheless, the study acknowledges its limitations in terms of the restricted number of AI-based tools used and the small number of projects utilized from Defects4J.
|
18 |
Automation in CS1 with the Factoring Problem GeneratorParker, Joshua B. 01 December 2009 (has links) (PDF)
As the field of computer science continues to grow, the number of students enrolled in related programs will grow as well. Though one-on-one tutoring is one of the more effective means of teaching, computer science instructors will have less and less time to devote to individual students. To address this growing concern, many tools that automate parts of an instructor’s job have been proposed. These tools can assist instructors in presenting concepts and grading student work, and they can help students learn to program more effectively. A growing group of intelligent tutoring systems attempts to tie all of this functionality into a single tool that is meant to be used throughout an entire CS course or series of courses.
To contribute to this emerging area, the Factoring Problem Generator (FPG) is presented in this work. The FPG creates and grades problems in C in which students search for and extract blocks of repeated code into individual functions, learning to utilize parameters and return values as they do so. The problems created by the FPG are highly configurable by instructors such that the difficulty can be finely tuned to suit students’ individual needs. Instructors can choose whether or not to include arrays, pointers, certain elemental data types, certain operators, or certain kinds of statements, among other things. The FPG is additionally capable of generating a set of test cases for each generated problem. These test cases fully exercise students’ solutions by covering all branches of execution, and they ensure that program functionality does not change as students factor code into functions.
Initial experimentation with the system has suggested that the FPG can be integrated into a beginning CS curriculum and with further refinement could become a standard tool in the CS classroom.
|
19 |
The utilization of log files generated by test executions: A systematic literature reviewGabaire, Elmi Bile January 2023 (has links)
Context: Testing is an important activity in software development and is typically estimated to account for nearly half of the efforts in the software development cycle. This puts a great demand on improving the artifacts involved in this task such as the test cases and test suites (a collection of test cases). Objective: When executing test programs, it is typical to record runtime information associated with the test cases in the form of test execution logs or traces. The aim of this work is to explore how this information can be utilized to improve the software testing process. To this end, two main aspects are investigated which are (1) in the context of test case generation and (2) in the context of different optimizations regarding existing test suites. Furthermore, the role of the logs regarding fault localization in connection with improving the existing test suites is investigated. Method: A systematic literature review is conducted to investigate, identify and analyze the existing literature on test case generation and test suite optimization that utilizes the test execution logs. Results: After a rigorous search in six digital databases, 26 primary studies were identified. 5 of the selected papers propose approaches in the context of test data generation, 8 papers suggest test case prioritization (TCP) techniques, 4 papers discuss approaches in test case selection (TCS), and 5 papers propose approaches in test suite minimization (TSM). Furthermore, we identified, 3 papers that discuss fault localization, and one paper that discussed the decomposition of large test cases into smaller single purpose test cases using the logs from previous test executions. Conclusion: The test execution logs are a useful source of information for different testing activities. Regarding test case generation, the main theme observed is the use of genetic algorithms in attempting to generate appropriate test cases when the alternative might have been to use random test data generation methods. When it comes to improving existing test suites several approaches within TCP, TCS and TSM such as similarity-based, modification-based, cluster-based, and search-based were put forward by the authors of the selected primary studies. Furthermore, several fault localization techniques using the logs were suggested.
|
20 |
An In-Depth study on the Utilization of Large Language Models for Test Case GenerationJohnsson, Nicole January 2024 (has links)
This study investigates the utilization of Large Language Models for Test Case Generation. The study uses the Large Language model and Embedding model provided by Llama, specifically Llama2 of size 7B, to generate test cases given a defined input. The study involves an implementation that uses customization techniques called Retrieval Augmented Generation (RAG) and Prompt Engineering. RAG is a method that in this study, stores organisation information locally, which is used to create test cases. This stored data is used as complementary data apart from the pre-trained data that the large language model has already trained on. By using this method, the implementation can gather specific organisation data and therefore have a greater understanding of the required domains. The objective of the study is to investigate how AI-driven test case generation impacts the overall software quality and development efficiency. This is evaluated by comparing the output of the AI-based system, to manually created test cases, as this is the company standard at the time of the study. The AI-driven test cases are analyzed mainly in the form of coverage and time, meaning that we compare to which degree the AI system can generate test cases compared to the manually created test case. Likewise, time is taken into consideration to understand how the development efficiency is affected. The results reveal that by using Retrieval Augmented Generationin combination with Prompt Engineering, the system is able to identify test cases to a certain degree. The results show that 66.67% of a specific project was identified using the AI, however, minor noise could appear and results might differ depending on the project’s complexity. Overall the results revealed how the system can positively impact the development efficiency and could also be argued to have a positive effect on the software quality. However, it is important to understand that the implementation as its current stage, is not sufficient enough to be used independently, but should rather be used as a tool to more efficiently create test cases.
|
Page generated in 0.264 seconds