1 |
Active Learning with Combinatorial CoverageKatragadda, Sai Prathyush 04 August 2022 (has links)
Active learning is a practical field of machine learning as labeling data or determining which data to label can be a time consuming and inefficient task. Active learning automates the process of selecting which data to label, but current methods are heavily model reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment. We propose active learning methods utilizing Combinatorial Coverage to overcome these issues.
The proposed methods are data-centric, and through our experiments we show that the inclusion of coverage in active learning leads to sampling data that tends to be the best in transferring to different models and has a competitive sampling bias compared to benchmark methods. / Master of Science / Machine learning (ML) models are being used frequently in a variety of applications. For the model to be able to learn, data is required. Processing this data is often one of the most, if not the most, time consuming aspects of utilizing ML. One especially burdensome aspect of data processing is data labeling, or determining what each data point corresponds to in terms of real world class. For example, determining if a data point that is an image contains a plane or bird. This way the ML model can learn from the data.
Active learning is a sub-field of machine learning which aims to ease this burden by allowing the model to select which data would be most beneficial to label, so that the entirety of the dataset does not need to be labeled. The issue with current active learning methods is that they are highly model dependent. In machine learning deployment the model being used may change while data stays the same, so this model dependency can cause for data points we label with respect to one model to not be ideal for another model. This model dependency has led to sampling bias issues as well; points which are chosen to be labeled may all be similar or not representative of all data resulting in the ML model not being as knowledgeable as possible.
Relevant work has focused on the sampling bias issue, and several methods have been proposed to combat this issue. Few of the methods are applicable to any type of ML model though. The issue of sampled points not generalizing to different models has been studied but no solutions have been proposed.
In this work we present active learning methods using Combinatorial Coverage. Combinatorial Coverage is a statistical technique from the field of Design of Experiments, and has commonly been used to design test sets. The extension of Combinatorial Coverage to ML is newer, and provides a way to focus on the data. We show that this data focused approach to active learning achieves a better performance when the sampled data is used for a different model and that it achieves a competitive sampling bias compared to benchmark methods.
|
2 |
Measuring Combinatorial Coverage of Manual TestingFifo, Miraldi January 2018 (has links)
Introduction: Software testing is a very important activity which assures the quality of the software under test. It becomes crucial in safety-critical systems, where an unexpected behavior of the software can even cause loss of human life or environmental disasters. However, in such complex systems it becomes infeasible to test all possible software scenarios for possible faults. Experience shows that software faults, which can cause unexpected software behavior, are caused by the interactions of variables of the tests. Combinatorial testing is the technique which focuses on the variable interactions of the tests and aims to reduce the number of tests needed to cover all software scenarios while still preserving a high fault detection rate. Background: Manual testing is the technique used to assure software quality in Bombardier Transportation AB, a Swedish company whose focus is on rail transport and development of trains. Since this process depends on the skills of the engineers, it can result in a large portion of tests not created and consequently in a large number of scenarios uncovered with tests. Therefore, combinatorial testing technique is used to measure the combinatorial coverage of these tests created from experienced engineers. Many comparisons of manual testing and other testing techniques in terms of test coverage, code coverage or mutation analysis. To the best of our knowledge there are no other studies in literature that have measured the combinatorial coverage of manual tests designed from experienced engineers, for different strength interactions of the variables of the tests nor other available tools that generate the number of missing tests to achieve full combinatorial coverage for specific interactions. Aim: The goal of this thesis is to answer the two research questions: RQ1. What is the combinatorial coverage achieved by tests manually created by experienced engineers in industry? RQ2. Can the effectiveness of manually created tests be improved in terms of combinatorial coverage using combinatorial testing? Method: In this thesis we investigate the combinatorial coverage of manually created tests by engineers working in industry and the implications of using combinatorial testing in practice. The Combinatorial Coverage Measurement2(CCM) NIST tool is used to measure the test coverage achieved. The research questions are answered by the following steps: 1) Review the scientific literature for related work, 2) Refine thesis research questions based on the previous step, 3) Propose the case study design and perform the measurements needed for data analysis, 4) and the results are analyzed and discussed in terms of test efficiency (i.e., number of test cases) and effectiveness (i.e., achieved combinatorial coverage). Results: The 2-way interaction combinatorial coverage achieved by manual tests is 78.6% on average, 57% for 3-way combinatorial coverage, 40.2% for 4-way combinatorial coverage, 20.2% for 5-way combinatorial coverage and 13% for 6-way combinatorial coverage. Full combinatorial coverage can be achieved for 2-way and 3-way interactions by adding eight and 66 missing tests on average, respectively. For 4-way interactions full combinatorial coverage can achieved by adding 658 missing tests. For 5-way and 6-way interactions full combinatorial coverage can be achieved by adding 5163 and 6170 missing tests on average respectively. Conclusion: The combinatorial coverage decreases as the strength of interactions increases. The effectiveness of the tests can be improved for 2-way and 3-way interactions and fully or partially improved for 4-way interactions, depending on the decision of engineers to add all missing tests or part of them, since the number of missing tests is increasing significantly, thus resulting in a very large number of tests to be added. It is not particularly efficient to improve test effectiveness by augmenting manual tests cases using combinatorial testing for higher strength interactions, since in most of the tests suites we studied one would need to generate additional 10.000 missing tests. This is explained by the exponential growth of the number of variable combination for such interactions.
|
Page generated in 0.1076 seconds