Global ETD Search

51	Generation of Synthetic Data with Generative Adversarial Networks Garcia Torres, Douglas January 2018 (has links) The aim of synthetic data generation is to provide data that is not real for cases where the use of real data is somehow limited. For example, when there is a need for larger volumes of data, when the data is sensitive to use, or simply when it is hard to get access to the real data. Traditional methods of synthetic data generation use techniques that do not intend to replicate important statistical properties of the original data. Properties such as the distribution, the patterns or the correlation between variables, are often omitted. Moreover, most of the existing tools and approaches require a great deal of user-defined rules and do not make use of advanced techniques like Machine Learning or Deep Learning. While Machine Learning is an innovative area of Artificial Intelligence and Computer Science that uses statistical techniques to give computers the ability to learn from data, Deep Learning is a closely related field based on learning data representations, which may serve useful for the task of synthetic data generation. This thesis focuses on one of the most interesting and promising innovations of the last years in the Machine Learning community: Generative Adversarial Networks. An approach for generating discrete, continuous or text synthetic data with Generative Adversarial Networks is proposed, tested, evaluated and compared with a baseline approach. The results prove the feasibility and show the advantages and disadvantages of using this framework. Despite its high demand for computational resources, a Generative Adversarial Networks framework is capable of generating quality synthetic data that preserves the statistical properties of a given dataset. / Syftet med syntetisk datagenerering är att tillhandahålla data som inte är verkliga i fall där användningen av reella data på något sätt är begränsad. Till exempel, när det finns behov av större datamängder, när data är känsliga för användning, eller helt enkelt när det är svårt att få tillgång till den verkliga data. Traditionella metoder för syntetiska datagenererande använder tekniker som inte avser att replikera viktiga statistiska egenskaper hos de ursprungliga data. Egenskaper som fördelningen, mönstren eller korrelationen mellan variabler utelämnas ofta. Dessutom kräver de flesta av de befintliga verktygen och metoderna en hel del användardefinierade regler och använder inte avancerade tekniker som Machine Learning eller Deep Learning. Machine Learning är ett innovativt område för artificiell intelligens och datavetenskap som använder statistiska tekniker för att ge datorer möjlighet att lära av data. Deep Learning ett närbesläktat fält baserat på inlärningsdatapresentationer, vilket kan vara användbart för att generera syntetisk data. Denna avhandling fokuserar på en av de mest intressanta och lovande innovationerna från de senaste åren i Machine Learning-samhället: Generative Adversarial Networks. Generative Adversarial Networks är ett tillvägagångssätt för att generera diskret, kontinuerlig eller textsyntetisk data som föreslås, testas, utvärderas och jämförs med en baslinjemetod. Resultaten visar genomförbarheten och visar fördelarna och nackdelarna med att använda denna metod. Trots dess stora efterfrågan på beräkningsresurser kan ett generativt adversarialnätverk skapa generell syntetisk data som bevarar de statistiska egenskaperna hos ett visst dataset. Synthetic Data Generation Generative Adversarial Networks Machine Learning Deep Learning Neural Networks Syntetisk Datagenerering Generativa Adversariella Nätverk Maskin-lärande Djupt Lärande Neurala Nätverk. Computer and Information Sciences Data- och informationsvetenskap
52	Energy-Efficient Private Forecasting on Health Data using SNNs / Energieffektiv privat prognos om hälsodata med hjälp av SNNs Di Matteo, Davide January 2022 (has links) Health monitoring devices, such as Fitbit, are gaining popularity both as wellness tools and as a source of information for healthcare decisions. Predicting such wellness goals accurately is critical for the users to make informed lifestyle choices. The core objective of this thesis is to design and implement such a system that takes energy consumption and privacy into account. This research is modelled as a time-series forecasting problem that makes use of Spiking Neural Networks (SNNs) due to their proven energy-saving capabilities. Thanks to their design that closely mimics natural neural networks (such as the brain), SNNs have the potential to significantly outperform classic Artificial Neural Networks in terms of energy consumption and robustness. In order to prove our hypotheses, a previous research by Sonia et al. [1] in the same domain and with the same dataset is used as our starting point, where a private forecasting system using Long short-term memory (LSTM) is designed and implemented. Their study also implements and evaluates a clustering federated learning approach, which fits well the highly distributed data. The results obtained in their research act as a baseline to compare our results in terms of accuracy, training time, model size and estimated energy consumed. Our experiments show that Spiking Neural Networks trades off accuracy (2.19x, 1.19x, 4.13x, 1.16x greater Root Mean Square Error (RMSE) for macronutrients, calories burned, resting heart rate, and active minutes respectively), to grant a smaller model (19% less parameters an 77% lighter in memory) and a 43% faster training. Our model is estimated to consume 3.36μJ per inference, which is much lighter than traditional Artificial Neural Networks (ANNs) [2]. The data recorded by health monitoring devices is vastly distributed in the real-world. Moreover, with such sensitive recorded information, there are many possible implications to consider. For these reasons, we apply the clustering federated learning implementation [1] to our use-case. However, it can be challenging to adopt such techniques since it can be difficult to learn from data sequences that are non-regular. We use a two-step streaming clustering approach to classify customers based on their eating and exercise habits. It has been shown that training different models for each group of users is useful, particularly in terms of training time; however this is strongly dependent on the cluster size. Our experiments conclude that there is a decrease in error and training time if the clusters contain enough data to train the models. Finally, this study addresses the issue of data privacy by using state of-the-art differential privacy. We apply e-differential privacy to both our baseline model (trained on the whole dataset) and our federated learning based approach. With a differential privacy of ∈= 0.1 our experiments report an increase in the measured average error (RMSE) of only 25%. Specifically, +23.13%, 25.71%, +29.87%, 21.57% for macronutrients (grams), calories burned (kCal), resting heart rate (beats per minute (bpm), and minutes (minutes) respectively. / Hälsoövervakningsenheter, som Fitbit, blir allt populärare både som friskvårdsverktyg och som informationskälla för vårdbeslut. Att förutsäga sådana välbefinnandemål korrekt är avgörande för att användarna ska kunna göra välgrundade livsstilsval. Kärnmålet med denna avhandling är att designa och implementera ett sådant system som tar hänsyn till energiförbrukning och integritet. Denna forskning är modellerad som ett tidsserieprognosproblem som använder sig av SNNs på grund av deras bevisade energibesparingsförmåga. Tack vare deras design som nära efterliknar naturliga neurala nätverk (som hjärnan) har SNNs potentialen att avsevärt överträffa klassiska artificiella neurala nätverk när det gäller energiförbrukning och robusthet. För att bevisa våra hypoteser har en tidigare forskning av Sonia et al. [1] i samma domän och med samma dataset används som utgångspunkt, där ett privat prognossystem som använder LSTM designas och implementeras. Deras studie implementerar och utvärderar också en klustringsstrategi för federerad inlärning, som passar väl in på den mycket distribuerade data. Resultaten som erhållits i deras forskning fungerar som en baslinje för att jämföra våra resultat vad gäller noggrannhet, träningstid, modellstorlek och uppskattad energiförbrukning. Våra experiment visar att Spiking Neural Networks byter ut precision (2,19x, 1,19x, 4,13x, 1,16x större RMSE för makronäringsämnen, förbrända kalorier, vilopuls respektive aktiva minuter), för att ge en mindre modell ( 19% mindre parametrar, 77% lättare i minnet) och 43% snabbare träning. Vår modell beräknas förbruka 3, 36μJ, vilket är mycket lättare än traditionella ANNs [2]. Data som registreras av hälsoövervakningsenheter är enormt spridda i den verkliga världen. Dessutom, med sådan känslig registrerad information finns det många möjliga konsekvenser att överväga. Av dessa skäl tillämpar vi klustringsimplementeringen för federerad inlärning [1] på vårt användningsfall. Det kan dock vara utmanande att använda sådana tekniker eftersom det kan vara svårt att lära sig av datasekvenser som är oregelbundna. Vi använder en tvåstegs streaming-klustringsmetod för att klassificera kunder baserat på deras mat- och träningsvanor. Det har visat sig att det är användbart att träna olika modeller för varje grupp av användare, särskilt när det gäller utbildningstid; detta är dock starkt beroende av klustrets storlek. Våra experiment drar slutsatsen att det finns en minskning av fel och träningstid om klustren innehåller tillräckligt med data för att träna modellerna. Slutligen tar denna studie upp frågan om datasekretess genom att använda den senaste differentiell integritet. Vi tillämpar e-differentiell integritet på både vår baslinjemodell (utbildad på hela datasetet) och vår federerade inlärningsbaserade metod. Med en differentiell integritet på ∈= 0.1 rapporterar våra experiment en ökning av det uppmätta medelfelet (RMSE) på endast 25%. Specifikt +23,13%, 25,71%, +29,87%, 21,57% för makronäringsämnen (gram), förbrända kalorier (kCal), vilopuls (bpm och minuter (minuter). Spiking neural networks differential privacy synthetic data generation smart health care fitness trackers. Spikande neurala nätverk differentiell integritet syntetisk datagenerering smart hälsovård träningsspårare. Computer and Information Sciences Data- och informationsvetenskap
53	Test case generation using symbolic grammars and quasirandom sequences Felix Reyes, Alejandro 06 1900 (has links) This work presents a new test case generation methodology, which has a high degree of automation (cost reduction); while providing increased power in terms of defect detection (benefits increase). Our solution is a variation of model-based testing, which takes advantage of symbolic grammars (a context-free grammar where terminals are replaced by regular expressions that represent their solution space) and quasi-random sequences to generate test cases. Previous test case generation techniques are enhanced with adaptive random testing to maximize input space coverage; and selective and directed sentence generation techniques to optimize sentence generation. Our solution was tested by generating 200 firewall policies containing up to 20 000 rules from a generic firewall grammar. Our results show how our system generates test cases with superior coverage of the input space, increasing the probability of defect detection while reducing considerably the needed number the test cases compared with other previously used approaches. / Software Engineering and Intelligent Systems symbolic grammars model-based testing testing firewall testing firewall policies quasi-random sequences adaptive testing firewalls grammar-based testing grammars data generation test data
54	Test case generation using symbolic grammars and quasirandom sequences Felix Reyes, Alejandro Unknown Date No description available. symbolic grammars model-based testing testing firewall testing firewall policies quasi-random sequences adaptive testing firewalls grammar-based testing grammars data generation test data
55	Génération automatique de test pour les contrôleurs logiques programmables synchrones / Automated test generation for logical programmable synchronous controllers Tka, Mouna 02 June 2016 (has links) Ce travail de thèse, effectué dans la cadre du projet FUI Minalogic Bluesky, porte sur le test fonctionnel automatisé d'une classe particulière de contrôleurs logiques programmables (em4) produite par InnoVista Sensors. Ce sont des systèmes synchrones qui sont programmés au moyen d'un environnement de développement intégré (IDE). Les personnes qui utilisent et programment ces contrôleurs ne sont pas nécessairement des programmeurs experts. Le développement des applications logicielles doit être par conséquent simple et intuitif. Cela devrait également être le cas pour les tests. Même si les applications définies par ces utilisateurs ne sont pas nécessairement très critiques, il est important de les tester d'une manière adéquate et efficace. Un simulateur inclu dans l'IDE permet aux programmeurs de tester leurs programmes d'une façon qui reste à ce jour informelle et interactive en entrant manuellement des données de test. En se basant sur des recherches précédentes dans le domaine du test des programmes synchrones, nous proposons un nouveau langage de spécification de test, appelé SPTL (Synchronous Programs Testing Language) qui rend possible d'exprimer simplement des scénarios de test qui peuvent être exécutées à la volée pour générer automatiquement des séquences d'entrée de test. Il permet aussi de décrire l'environnement où évolue le système pour mettre des conditions sur les entrées afin d'arriver à des données de test réalistes et de limiter celles qui sont inutiles. SPTL facilite cette tâche de test en introduisant des notions comme les profils d'utilisation, les groupes et les catégories. Nous avons conçu et développé un prototype, nommé "Testium", qui traduit un programme SPTL en un ensemble de contraintes exploitées par un solveur Prolog qui choisit aléatoirement les entrées de test. La génération de données de test s'appuie ainsi sur des techniques de programmation logique par contraintes. Pour l'évaluer, nous avons expérimenté cette méthode sur des exemples d'applications EM4 typiques et réels. Bien que SPTL ait été évalué sur em4, son utilisation peut être envisagée pour la validation d'autres types de contrôleurs ou systèmes synchrones. / This thesis work done in the context of the FUI project Minalogic Bluesky, concerns the automated functional testing of a particular class of programmable logic controllers (em4) produced by InnoVista Sensors. These are synchronous systems that are programmed by means of an integrated development environment (IDE). People who use and program these controllers are not necessarily expert programmers. The development of software applications should be as result simple and intuitive. This should also be the case for testing. Although applications defined by these users need not be very critical, it is important to test them adequately and effectively. A simulator included in the IDE allows programmers to test their programs in a way that remains informal and interactive by manually entering test data.Based on previous research in the area of synchronous test programs, we propose a new test specification language, called SPTL (Synchronous Testing Programs Language) which makes possible to simply express test scenarios that can be executed on the fly to automatically generate test input sequences. It also allows describing the environment in which the system evolves to put conditions on inputs to arrive to realistic test data and limit unnecessary ones. SPTL facilitates this testing task by introducing concepts such as user profiles, groups and categories. We have designed and developed a prototype named "Testium", which translates a SPTL program to a set of constraints used by a Prolog solver that randomly selects the test inputs. So, generating test data is based on constraint logic programming techniques.To assess this, we experimented this method on realistic and typical examples of em4 applications. Although SPTL was evaluated on EM4, its use can be envisaged for the validation of other types of synchronous controllers or systems. Test logiciel Contrôleurs synchrones Programmation logique par contraintes Software testing Synchronous controllers Automatic test data generation Constraint logic programming 004
56	Investigating Metrics that are Good Predictors of Human Oracle Costs An Experiment Kartheek arun sai ram, chilla, Kavya, Chelluboina January 2017 (has links) Context. Human oracle cost, the cost associated in estimating the correctness of the output for the given test inputs is manually evaluated by humans and this cost is significant and is a concern in the software test data generation field. This study has been designed in the context to assess metrics that might predict human oracle cost. Objectives. The major objective of this study is to address the human oracle cost, for this the study identifies the metrics that are good predictors of human oracle cost and can further help to solve the oracle problem. In this process, the identified suitable metrics from the literature are applied on the test input, to see if they can help in predicting the correctness of the output for the given test input. Methods. Initially a literature review was conducted to find some of the metrics that are relevant to the test data. Besides finding the aforementioned metrics, our literature review also tries to find out some possible code metrics that can be ap- plied on test data. Before conducting the actual experiment two pilot experiments were conducted. To accomplish our research objectives an experiment is conducted in the BTH university with master students as sample population. Further group interviews were conducted to check if the participants perceive any new metrics that might impact the correctness of the output. The data obtained from the experiment and the interviews is analyzed using linear regression model in SPSS suite. Further to analyze the accuracy vs metric data, linear discriminant model using SPSS pro- gram suite was used. Results.Our literature review resulted in 4 metrics that are suitable to our study. As our test input is HTML we took HTML depth, size, compression size, number of tags as our metrics. Also, from the group interviews another 4 metrics are drawn namely number of lines of code and number of <div>, anchor <a> and paragraph <p> tags as each individual metric. The linear regression model which analyses time vs metric data, shows significant results, but with multicollinearity effecting the result, there was no variance among the considered metrics. So, the results of our study are proposed by adjusting the multicollinearity. Besides, the above analysis, linear discriminant model which analyses accuracy vs metric data was conducted to predict the metrics that influences accuracy. The results of our study show that metrics positively correlate with time and accuracy. Conclusions. From the time vs metric data, when multicollinearity is adjusted by applying step-wise regression reduction technique, the program size, compression size and <div> tag are influencing the time taken by sample population. From accuracy vs metrics data number of <div> tags and number of lines of code are influencing the accuracy of the sample population. Test data generation comprehensibility of test data software test data metrics software code metrics multiple regression analysis linear discriminant analysis. Software Engineering Programvaruteknik
57	Automated Software Testing : A Study of the State of Practice / Automated Software Testing : A Study of the State of Practice Rafi, Dudekula Mohammad, Reddy, Kiran Moses Katam January 2012 (has links) Context: Software testing is expensive, labor intensive and consumes lot of time in a software development life cycle. There was always a need in software testing to decrease the testing time. This also resulted to focus on Automated Software Testing (AST), because using automated testing, with specific tools, this effort can be dramatically reduced and the costs related with testing can decrease [11]. Manual Testing (MT) requires lot of effort and hard work, if we measure in terms of person per month [11]. Automated Software testing helps to decrease the work load by giving some testing tasks to the computers. Computer systems are cheap, they are faster and don‘t get bored and can work continuously in the weekends. Due to this advantage many researches are working towards the Automation of software testing, which can help to complete the task in less testing time [10]. Objectives: The main aims of this thesis is to 1.) To systematically classify contributions within AST. 2.) To identify the different benefits and challenges of AST. 3.) To identify the whether the reported benefits and challenges found in the literature are prevalent in industry. Methods: To fulfill our aims and objectives, we used Systematic mapping research methodology to systematically classify contributions within AST. We also used SLR to identify the different benefits and challenges of AST. Finally, we performed web based survey to validate the finding of SLR. Results: After performing Systematic mapping, the main aspects within AST include purpose of automation, levels of testing, Technology used, different types of research types used and frequency of AST studies over the time. From Systematic literature review, we found the benefits and challenges of AST. The benefits of AST include higher product quality, less testing time, reliability, increase in confidence, reusability, less human effort, reduction of cost and increase in fault detection. The challenges include failure to achieve expected goals, difficulty in maintenance of test automation, Test automation needs more time to mature, false expectations and lack of skilled people for test automation tools. From web survey, it is observed that almost all the benefits and challenges are prevalent in industry. The benefits such as fault detection and confidence are in contrary to the results of SLR. The challenge about the appropriate test automation strategy has 24 % disagreement from the respondents and 30% uncertainty. The reason is that the automation strategy is totally dependent on the test manager of the project. When asked “Does automated software testing fully replace manual testing”, 80% disagree with this challenge. Conclusion: The classification of the AST studies using systematic mapping gives an overview of the work done in the area of AST and also helps to find research coverage in the area of AST. These results can be used by researchers to use the gaps found in the mapping studies to carry on future work. The results of SLR and web survey clearly show that the practitioners clearly realize the benefits and challenges of AST reported in the literature. / Mobile no: +46723069909 Automated software testing (AST) Manual Testing (MT) Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
58	Material Artefact Generation / Material Artefact Generation Rončka, Martin January 2019 (has links) Ne vždy je jednoduché získání dostatečně velké a kvalitní datové sady s obrázky zřetelných artefaktů, ať už kvůli nedostatku ze strany zdroje dat nebo složitosti tvorby anotací. To platí například pro radiologii, nebo také strojírenství. Abychom mohli využít moderní uznávané metody strojového učení které se využívají pro klasifikaci, segmentaci a detekci defektů, je potřeba aby byla datová sada dostatečně velká a vyvážená. Pro malé datové sady čelíme problémům jako je přeučení a slabost dat, které způsobují nesprávnou klasifikaci na úkor málo reprezentovaných tříd. Tato práce se zabývá prozkoumáváním využití generativních sítí pro rozšíření a vyvážení datové sady o nové vygenerované obrázky. Za použití sítí typu Conditional Generative Adversarial Networks (CGAN) a heuristického generátoru anotací jsme schopni generovat velké množství nových snímků součástek s defekty. Pro experimenty s generováním byla použita datová sada závitů. Dále byly použity dvě další datové sady keramiky a snímků z MRI (BraTS). Nad těmito dvěma datovými sadami je provedeno zhodnocení vlivu generovaných dat na učení a zhodnocení přínosu pro zlepšení klasifikace a segmentace.
59	Framework pro tvorbu generátorů dat / Framework for Data Generators Kříž, Blažej January 2012 (has links) This master's thesis is focused on the problem of data generation. At the beginning, it presents several applications for data generation and describes the data generation process. Then it deals with development of framework for data generators and demonstrational application for validating the framework.
60	The Use of Big Data in Process Management : A Literature Study and Survey Investigation Ephraim, Ekow Esson, Sehic, Sanel January 2021 (has links) In recent years there has been an increasing interest in understanding how organizations can utilize big data in their process management to create value and improve their processes. This is due to new challenges for process management which have arisen from increasing competition and the complexity of large data sets due to technological advancements. These large data sets have been described by scholars as big data which involves data that are so complex traditional data analysis software are not sufficient in managing or analyzing them. Because of the complexity of handling such great volumes of data there is a big gap in practical examples where organizations have incorporated big data in their process management. Therefore, in order to fill relevant gaps and contribute to advancements in this field, this thesis will explore how big data can contribute to improved process management. Hence, the aim of this thesis entailed investigating how, why and to what extent big data is used in process management. As well as to outline the purpose and challenges of using big data in process management. This was accomplished through a literature review and a survey, respectively, in order to understand how big data had previously been used to create value and improve processes in organizations. From the extensive literature review, an analysis matrix of how big data is used in process management is provided through the intersections between big data and process management dimensions. The analysis matrix showed that most of the instances in which big data was used in process management were in process analysis & improvement and process control & agility. Simply put, organizations used big data in specific activities involved in process management but not in a holistic manner. Furthermore, the limited findings from the survey indicate that the main challenges and purposes of big data use in Swedish organizations are the complexity of handling data and making statistically better decisions, respectively. Big data process management process mapping and development process analysis and improvement process control and agility data strategy data quality and governance database management data generation and integration. Engineering and Technology Teknik och teknologier

Search results