• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 46
  • 14
  • 4
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 84
  • 84
  • 34
  • 33
  • 30
  • 20
  • 15
  • 15
  • 14
  • 13
  • 13
  • 12
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

GAN-Based Approaches for Generating Structured Data in the Medical Domain

Abedi, Masoud, Hempel, Lars, Sadeghi, Sina, Kirsten, Toralf 03 November 2023 (has links)
Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitations imposed by privacy regulations or the presence of a small number of patients (e.g., rare diseases). To address this data scarcity and to improve the situation, novel generative models such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic data that mimic real data by representing features that reflect health-related information without reference to real patients. In this paper, we consider several GAN models to generate synthetic data used for training binary (malignant/benign) classifiers, and compare their performances in terms of classification accuracy with cases where only real data are considered. We aim to investigate how synthetic data can improve classification accuracy, especially when a small amount of data is available. To this end, we have developed and implemented an evaluation framework where binary classifiers are trained on extended datasets containing both real and synthetic data. The results show improved accuracy for classifiers trained with generated data from more advanced GAN models, even when limited amounts of original data are available.
32

An automatic test data generation from UML state diagram using genetic algorithm.

Doungsa-ard, Chartchai, Dahal, Keshav P., Hossain, M. Alamgir, Suwannasart, T. January 2007 (has links)
Software testing is a part of software development process. However, this part is the first one to miss by software developers if there is a limited time to complete the project. Software developers often finish their software construction closed to the delivery time, they usually don¿t have enough time to create effective test cases for testing their programs. Creating test cases manually is a huge work for software developers in the rush hours. A tool which automatically generates test cases and test data can help the software developers to create test cases from software designs/models in early stage of the software development (before coding). Heuristic techniques can be applied for creating quality test data. In this paper, a GA-based test data generation technique has been proposed to generate test data from UML state diagram, so that test data can be generated before coding. The paper details the GA implementation to generate sequences of triggers for UML state diagram as test cases. The proposed algorithm has been demonstrated manually for an example of a vending machine.
33

Test data generation from UML state machine diagrams using GAs

Doungsa-ard, Chartchai, Dahal, Keshav P., Hossain, M. Alamgir, Suwannasart, T. January 2008 (has links)
Automatic test data generation helps testers to validate software against user requirements more easily. Test data can be generated from many sources; for example, experience of testers, source program, or software specification. Selecting a proper test data set is a decision making task. Testers have to decide what test data that they should use, and a heuristic technique is needed to solve this problem automatically. In this paper, we propose a framework for generating test data from software specifications. The selected specification is Unified Modeling Language (UML) state machine diagram. UML state machine diagram describes a system in term of state which can be changed when there is an action occurring in the system. The generated test data is a sequence of these actions. These sequences of action help testers to know how they should test the system. The quality of generated test data is measured by the number of transitions which is fired using the test data. The more transitions test data can fire, the better quality of test data is. The number of coverage transitions is also used as a feedback for a heuristic search for a better test set. Genetic algorithms (GAs) are selected for searching the best test data. Our experimental results show that the proposed GA-based approach can work well for generating test data for some types of UML state machine diagrams.
34

The Importance of Data in RF Machine Learning

Clark IV, William Henry 17 November 2022 (has links)
While the toolset known as Machine Learning (ML) is not new, several of the tools available within the toolset have seen revitalization with improved hardware, and have been applied across several domains in the last two decades. Deep Neural Network (DNN) applications have contributed to significant research within Radio Frequency (RF) problems over the last decade, spurred by results in image and audio processing. Machine Learning (ML), and Deep Learning (DL) specifically, are driven by access to relevant data during the training phase of the application due to the learned feature sets that are derived from vast amounts of similar data. Despite this critical reliance on data, the literature provides insufficient answers on how to quantify the data training needs of an application in order to achieve a desired performance. This dissertation first aims to create a practical definition that bounds the problem space of Radio Frequency Machine Learning (RFML), which we take to mean the application of Machine Learning (ML) as close to the sampled baseband signal directly after digitization as is possible, while allowing for preprocessing when reasonably defined and justified. After constraining the problem to the Radio Frequency Machine Learning (RFML) domain space, an understanding of what kinds of Machine Learning (ML) have been applied as well as the techniques that have shown benefits will be reviewed from the literature. With the problem space defined and the trends in the literature examined, the next goal aims at providing a better understanding for the concept of data quality through quantification. This quantification helps explain how the quality of data: affects Machine Learning (ML) systems with regard to final performance, drives required data observation quantity within that space, and impacts can be generalized and contrasted. With the understanding of how data quality and quantity can affect the performance of a system in the Radio Frequency Machine Learning (RFML) space, an examination of the data generation techniques and realizations from conceptual through real-time hardware implementations are discussed. Consequently, the results of this dissertation provide a foundation for estimating the investment required to realize a performance goal within a Deep Learning (DL) framework as well as a rough order of magnitude for common goals within the Radio Frequency Machine Learning (RFML) problem space. / Doctor of Philosophy / Machine Learning (ML) is a powerful toolset capable of solving difficult problems across many domains. A fundamental part of this toolset is the representative data used to train a system. Unlike the domains of image or audio processing, for which datasets are constantly being developed thanks to usage agreements with entities such as Facebook, Google, and Amazon, the field of Machine Learning (ML) within the Radio Frequency (RF) domain, or Radio Frequency Machine Learning (RFML), does not have access to such crowdsourcing means of creating labeled datasets. Therefore data within the Radio Frequency Machine Learning (RFML) problem space must be intentionally cultivated to address the target problem. This dissertation explains the problem space of Radio Frequency Machine Learning (RFML) and then quantifies the effect of quality on data used during the training of Radio Frequency Machine Learning (RFML) systems. Taking this one step further, the work then goes on to provide a means of estimating data quantity needs to achieve high levels of performance based on the current Deep Learning (DL) approach to solve the problem, which in turn can be used as guidance to better refine the approach when the real-world data quantity requirements exceed practical acquisition levels. Finally, the problem of data generation is examined and provides context for the difficulties associated with procuring high quality data for problems in the Radio Frequency Machine Learning (RFML) space.
35

Multivariate Time Series Data Generation using Generative Adversarial Networks : Generating Realistic Sensor Time Series Data of Vehicles with an Abnormal Behaviour using TimeGAN

Nord, Sofia January 2021 (has links)
Large datasets are a crucial requirement to achieve high performance, accuracy, and generalisation for any machine learning task, such as prediction or anomaly detection, However, it is not uncommon for datasets to be small or imbalanced since gathering data can be difficult, time-consuming, and expensive. In the task of collecting vehicle sensor time series data, in particular when the vehicle has an abnormal behaviour, these struggles are present and may hinder the automotive industry in its development. Synthetic data generation has become a growing interest among researchers in several fields to handle the struggles with data gathering. Among the methods explored for generating data, generative adversarial networks (GANs) have become a popular approach due to their wide application domain and successful performance. This thesis focuses on generating multivariate time series data that are similar to vehicle sensor readings from the air pressures in the brake system of vehicles with an abnormal behaviour, meaning there is a leakage somewhere in the system. A novel GAN architecture called TimeGAN was trained to generate such data and was then evaluated using both qualitative and quantitative evaluation metrics. Two versions of this model were tested and compared. The results obtained proved that both models learnt the distribution and the underlying information within the features of the real data. The goal of the thesis was achieved and can become a foundation for future work in this field. / När man applicerar en modell för att utföra en maskininlärningsuppgift, till exempel att förutsäga utfall eller upptäcka avvikelser, är det viktigt med stora dataset för att uppnå hög prestanda, noggrannhet och generalisering. Det är dock inte ovanligt att dataset är små eller obalanserade eftersom insamling av data kan vara svårt, tidskrävande och dyrt. När man vill samla tidsserier från sensorer på fordon är dessa problem närvarande och de kan hindra bilindustrin i dess utveckling. Generering av syntetisk data har blivit ett växande intresse bland forskare inom flera områden som ett sätt att hantera problemen med datainsamling. Bland de metoder som undersökts för att generera data har generative adversarial networks (GANs) blivit ett populärt tillvägagångssätt i forskningsvärlden på grund av dess breda applikationsdomän och dess framgångsrika resultat. Denna avhandling fokuserar på att generera flerdimensionell tidsseriedata som liknar fordonssensoravläsningar av lufttryck i bromssystemet av fordon med onormalt beteende, vilket innebär att det finns ett läckage i systemet. En ny GAN modell kallad TimeGAN tränades för att genera sådan data och utvärderades sedan både kvalitativt och kvantitativt. Två versioner av denna modell testades och jämfördes. De erhållna resultaten visade att båda modellerna lärde sig distributionen och den underliggande informationen inom de olika signalerna i den verkliga datan. Målet med denna avhandling uppnåddes och kan lägga grunden för framtida arbete inom detta område.
36

Search-based software engineering : a search-based approach for testing from extended finite state machine (EFSM) models

Kalaji, Abdul Salam January 2010 (has links)
The extended finite state machine (EFSM) is a powerful modelling approach that has been applied to represent a wide range of systems. Despite its popularity, testing from an EFSM is a substantial problem for two main reasons: path feasibility and path test case generation. The path feasibility problem concerns generating transition paths through an EFSM that are feasible and satisfy a given test criterion. In an EFSM, guards and assignments in a path‟s transitions may cause some selected paths to be infeasible. The problem of path test case generation is to find a sequence of inputs that can exercise the transitions in a given feasible path. However, the transitions‟ guards and assignments in a given path can impose difficulties when producing such data making the range of acceptable inputs narrowed down to a possibly tiny range. While search-based approaches have proven efficient in automating aspects of testing, these have received little attention when testing from EFSMs. This thesis proposes an integrated search-based approach to automatically test from an EFSM. The proposed approach generates paths through an EFSM that are potentially feasible and satisfy a test criterion. Then, it generates test cases that can exercise the generated feasible paths. The approach is evaluated by being used to test from five EFSM cases studies. The achieved experimental results demonstrate the value of the proposed approach.
37

Diversified Ensemble Classifiers for Highly Imbalanced Data Learning and their Application in Bioinformatics

DING, ZEJIN 07 May 2011 (has links)
In this dissertation, the problem of learning from highly imbalanced data is studied. Imbalance data learning is of great importance and challenge in many real applications. Dealing with a minority class normally needs new concepts, observations and solutions in order to fully understand the underlying complicated models. We try to systematically review and solve this special learning task in this dissertation.We propose a new ensemble learning framework—Diversified Ensemble Classifiers for Imbal-anced Data Learning (DECIDL), based on the advantages of existing ensemble imbalanced learning strategies. Our framework combines three learning techniques: a) ensemble learning, b) artificial example generation, and c) diversity construction by reversely data re-labeling. As a meta-learner, DECIDL utilizes general supervised learning algorithms as base learners to build an ensemble committee. We create a standard benchmark data pool, which contains 30 highly skewed sets with diverse characteristics from different domains, in order to facilitate future research on imbalance data learning. We use this benchmark pool to evaluate and compare our DECIDL framework with several ensemble learning methods, namely under-bagging, over-bagging, SMOTE-bagging, and AdaBoost. Extensive experiments suggest that our DECIDL framework is comparable with other methods. The data sets, experiments and results provide a valuable knowledge base for future research on imbalance learning. We develop a simple but effective artificial example generation method for data balancing. Two new methods DBEG-ensemble and DECIDL-DBEG are then designed to improve the power of imbalance learning. Experiments show that these two methods are comparable to the state-of-the-art methods, e.g., GSVM-RU and SMOTE-bagging. Furthermore, we investigate learning on imbalanced data from a new angle—active learning. By combining active learning with the DECIDL framework, we show that the newly designed Active-DECIDL method is very effective for imbalance learning, suggesting the DECIDL framework is very robust and flexible.Lastly, we apply the proposed learning methods to a real-world bioinformatics problem—protein methylation prediction. Extensive computational results show that the DECIDL method does perform very well for the imbalanced data mining task. Importantly, the experimental results have confirmed our new contributions on this particular data learning problem.
38

Knowledge management infrastructure and knowledge sharing: The case of a large fast moving consumer goods distribution centre in the Western Cape

George, Chadrick Hendrik January 2014 (has links)
Magister Commercii - MCom / The aim of this study is to understand how knowledge is created, shared and used within the fast moving consumer goods distribution centre in the Western Cape (WC). It also aims to understand knowledge sharing between individuals in the organisation. A literature review was conducted, in order to answer the research questions- this covered the background of knowledge management (KM) and KS and its current status with particular reference to SA’s private sector. The study found that technological KM infrastructure, cultural KM infrastructure and organisational KM infrastructure are important enablers of KS. A conceptual model was developed around these concepts. In order to answer the research questions, the study identified a FMCG DC in the WC, where KS is practiced
39

Génération automatique de tests unitaires avec Praspel, un langage de spécification pour PHP / The art of contract-based testiong in PHP with Praspel

Enderlin, Ivan 16 July 2014 (has links)
Les travaux présentés dans ce mémoire portent sur la validation de programmes PHP à travers un nouveau langage de spécification, accompagné de ses outils. Ces travaux s’articulent selon trois axes : langage de spécification, génération automatique de données de test et génération automatique de tests unitaires.La première contribution est Praspel, un nouveau langage de spécification pour PHP, basé sur la programmation par contrat. Praspel spécifie les données avec des domaines réalistes, qui sont des nouvelles structures permettant de valider etgénérer des données. À partir d’un contrat écrit en Praspel, nous pouvons faire du Contract-based Testing, c’est à dire exploiter les contrats pour générer automatiquement des tests unitaires. La deuxième contribution concerne la génération de données de test. Pour les booléens, les entiers et les réels, une génération aléatoire uniforme est employée. Pour les tableaux, un solveur de contraintes a été implémenté et utilisé. Pour les chaînes de caractères, un langage de description de grammaires avec un compilateur de compilateurs LL(⋆) et plusieurs algorithmes de génération de données sont employés. Enfin, la génération d’objets est traitée.La troisième contribution définit des critères de couverture sur les contrats.Ces derniers fournissent des objectifs de test. Toutes ces contributions ont été implémentées et expérimentées dans des outils distribués à la communauté PHP. / The works presented in this memoir are about the validation of PHPprograms through a new specification language, along with its tools. These works follow three axes: specification language, automatic test data generation and automatic unit test generation. The first contribution is Praspel, a new specification language for PHP, based on the Design by Contract. Praspel specifies data with realistic domains, which are new structures allowing to validate and generate data. Based on a contract, we are able to perform Contract-based Testing, i.e.using contracts to automatically generate unit tests. The second contribution isabout test data generation. For booleans, integers and floating point numbers, auniform random generation is used. For arrays, a dedicated constraint solver has been implemented and used. For strings, a grammar description language along with an LL(⋆) compiler compiler and several algorithms for data generation are used. Finally, the object generation is supported. The third contribution defines contract coverage criteria. These latters provide test objectives. All these contributions are implemented and experimented into tools distributed to the PHP community.
40

Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.

Malazizi, Ladan January 2008 (has links)
Toxic compounds, such as pesticides, are routinely tested against a range of aquatic, avian and mammalian species as part of the registration process. The need for reducing dependence on animal testing has led to an increasing interest in alternative methods such as in silico modelling. The QSAR (Quantitative Structure Activity Relationship)-based models are already in use for predicting physicochemical properties, environmental fate, eco-toxicological effects, and specific biological endpoints for a wide range of chemicals. Data plays an important role in modelling QSARs and also in result analysis for toxicity testing processes. This research addresses number of issues in predictive toxicology. One issue is the problem of data quality. Although large amount of toxicity data is available from online sources, this data may contain some unreliable samples and may be defined as of low quality. Its presentation also might not be consistent throughout different sources and that makes the access, interpretation and comparison of the information difficult. To address this issue we started with detailed investigation and experimental work on DEMETRA data. The DEMETRA datasets have been produced by the EC-funded project DEMETRA. Based on the investigation, experiments and the results obtained, the author identified a number of data quality criteria in order to provide a solution for data evaluation in toxicology domain. An algorithm has also been proposed to assess data quality before modelling. Another issue considered in the thesis was the missing values in datasets for toxicology domain. Least Square Method for a paired dataset and Serial Correlation for single version dataset provided the solution for the problem in two different situations. A procedural algorithm using these two methods has been proposed in order to overcome the problem of missing values. Another issue we paid attention to in this thesis was modelling of multi-class data sets in which the severe imbalance class samples distribution exists. The imbalanced data affect the performance of classifiers during the classification process. We have shown that as long as we understand how class members are constructed in dimensional space in each cluster we can reform the distribution and provide more knowledge domain for the classifier.

Page generated in 0.1094 seconds