Global ETD Search

61	Privacy-aware data generation : Using generative adversarial networks and differential privacy Hübinette, Felix January 2022 (has links) Today we are surrounded by IOT devices that constantly generate different kinds of data about its environment and its users. Much of this data could be useful for different research purposes and development, but a lot of this collected data is privacy-sensitive for the individual person. To protect the individual's privacy, we have data protection laws. But these restrictions by laws also dramatically reduce the amount of data available for research and development. Therefore it would be beneficial if we could find a work around that respects people's privacy without breaking the laws while still maintaining the usefulness of data. The purpose of this thesis is to show how we can generate privacy-aware data from a dataset by using Generative Adversarial Networks (GANS) and Differential Privacy (DP), that maintains data utility. This is useful because it allows for the sharing of privacy-preserving data, so that the data can be used in research and development with concern for privacy. GANS is used for generating synthetic data. DP is an anonymization technique of data. With the combination of these two techniques, we generate synthetic-privacy-aware data from an existing open-source Fitbit dataset. The specific type of GANS model that is used is called CTGAN and differential privacy is achieved with the help of gaussian noise. The results from the experiments performed show many similarities between the original dataset and the experimental datasets. The experiments performed very well at the Kolmogorov Smirnov test, with the lowest P-value of all experiments sitting at 0.92. The conclusion that is drawn is that this is another promising methodology for creating privacy-aware-synthetic data, that maintains reasonable data utility while still utilizing DP techniques to achieve data privacy. GANS DP generative adversarial networks differential privacy data generation privacy security Computer Sciences Datavetenskap (datalogi) Natural Sciences Naturvetenskap Computer and Information Sciences Data- och informationsvetenskap
62	Analysis and comparison of interfacing, data generation and workload implementation in BigDataBench 4.0 and Intel HiBench 7.0 Barosen, Alexander, Dalin, Sadok January 2018 (has links) One of the major challenges in Big Data is the accurate and meaningful assessment of system performance. Unlike other systems, minor differences in efficiency can escalate to large differences in costs and power consumption. While there are several tools on the marketplace for measuring the performance of Big Data systems, few of them have been explored in-depth. This report investigated the interfacing, data generation and workload implementations of two Big Data benchmarking suites, BigDataBench and Hibench. The purpose of the study was to establish the capabilities of each tool with regards to interfacing, data generation and workload implementation. An exploratory and qualitative approach was used to gather information and analyze each benchmarking tool. Source code, documentation, and reports published by the developers were used as information sources. The results showed that BigDataBench and HiBench were designed similarly with regards to interfacing and data flow during the execution of a workload with the exception of streaming workloads. BigDataBench provided for more realistic data generation while the data generation for HiBench was easier to control. With regards to workload design, the workloads in BigDataBench were designed to be applicable to multiple frameworks while the workloads in HiBench were focused on the Hadoop family. In conclusion, neither of benchmarking suites was superior to the other. They were both designed for different purposes and should be applied on a case-by-case basis. / En av de stora utmaningarna i Big Data är den exakta och meningsfulla bedömningen av systemprestanda. Till skillnad från andra system kan mindre skillnader i effektivitet eskalera till stora skillnader i kostnader och strömförbrukning. Medan det finns flera verktyg på marknaden för att mäta prestanda för Big Data-system, har få av dem undersökts djupgående. I denna rapport undersöktes gränssnittet, datagenereringen och arbetsbelastningen av två Big Data benchmarking-sviter, BigDataBench och HiBench. Syftet med studien var att fastställa varje verktygs kapacitet med hänsyn till de givna kriterierna. Ett utforskande och kvalitativt tillvägagångssätt användes för att samla information och analysera varje benchmarking verktyg. Källkod, dokumentation och rapporter som hade skrivits och publicerats av utvecklarna användes som informationskällor. Resultaten visade att BigDataBench och HiBench utformades på samma sätt med avseende på gränssnitt och dataflöde under utförandet av en arbetsbelastning med undantag för strömmande arbetsbelastningar. BigDataBench tillhandahöll mer realistisk datagenerering medan datagenerering för HiBench var lättare att styra. När det gäller arbetsbelastningsdesign var arbetsbelastningen i BigDataBench utformad för att kunna tillämpas på flera ramar, medan arbetsbelastningen i HiBench var inriktad på Hadoop-familjen. Sammanfattningsvis var ingen av benchmarkingssuperna överlägsen den andra. De var båda utformade för olika ändamål och bör tillämpas från fall till fall. Big Data Benchmarking BigDataBench HiBench Analy- sis Comparison Interfacing Data Generation Big Data Benchmarking BigDataBench HiBench Analys Jämförelse Gränssnitt Datagenerering Computer and Information Sciences Data- och informationsvetenskap
63	Improving XRD Analysis with Machine Learning Drapeau, Rachel E. 14 August 2023 (has links) (PDF) X-ray diffraction analysis (XRD) is an inexpensive method to quantify the relative proportions of mineral phases in a rock or soil sample. However, the analytical software available for XRD requires extensive user input to choose phases to include in the analysis. Consequently, analysis accuracy depends greatly on the experience of the analyst, especially as the number of phases in a sample increases (Raven & Self, 2017; Omotoso, 2006). The purpose of this project is to test whether incorporating machine learning methods into XRD software can improve the accuracy of analyses by assisting in the phase-picking process. In order to provide a large enough sample of X-ray diffraction (XRD) patterns and their known compositions to train the machine learning models, I created a dataset of 1.5 million calculated XRD patterns of realistic mineral mixtures. These synthetic XRD patterns were calculated using crystal structure files from the American Mineralogist Crystal Structure Database (AMCSD) with mineral occurrence data from the Mineral Evolution Database (MED) to mimic geologic knowledge used by expert analysts. Using this dataset, I trained and refined a variety of machine learning models to determine which model is most accurate in identifying the correct mineral phases. X-ray diffraction analysis XRD machine learning Rietveld method crystal structure classification decision trees bagged decision trees data generation mineral mixture Physical Sciences and Mathematics
64	Porovnání přístupů ke generování umělých dat / Comparison of Approaches to Synthetic Data Generation Šejvlová, Ludmila January 2017 (has links) The diploma thesis deals with synthetic data, selected approaches to their generation together with a practical task of data generation. The goal of the thesis is to describe the selected approaches to data generation, capture their key advantages and disadvantages and compare the individual approaches to each other. The practical part of the thesis describes generation of synthetic data for teaching knowledge discovery using databases. The thesis includes a basic description of synthetic data and thoroughly explains the process of their generation. The approaches selected for further examination are random data generation, the statistical approach, data generation languages and the ReverseMiner tool. The thesis also describes the practical usage of synthetic data and the suitability of each approach for certain purposes. Within this thesis, educational data Hotel SD were created using the ReverseMiner tool. The data contain relations discoverable with SD (set-difference) GUHA-procedures.
65	Augmenting High-Dimensional Data with Deep Generative Models / Högdimensionell dataaugmentering med djupa generativa modeller Nilsson, Mårten January 2018 (has links) Data augmentation is a technique that can be performed in various ways to improve the training of discriminative models. The recent developments in deep generative models offer new ways of augmenting existing data sets. In this thesis, a framework for augmenting annotated data sets with deep generative models is proposed together with a method for quantitatively evaluating the quality of the generated data sets. Using this framework, two data sets for pupil localization was generated with different generative models, including both well-established models and a novel model proposed for this purpose. The unique model was shown both qualitatively and quantitatively to generate the best data sets. A set of smaller experiments on standard data sets also revealed cases where this generative model could improve the performance of an existing discriminative model. The results indicate that generative models can be used to augment or replace existing data sets when training discriminative models. / Dataaugmentering är en teknik som kan utföras på flera sätt för att förbättra träningen av diskriminativa modeller. De senaste framgångarna inom djupa generativa modeller har öppnat upp nya sätt att augmentera existerande dataset. I detta arbete har ett ramverk för augmentering av annoterade dataset med hjälp av djupa generativa modeller föreslagits. Utöver detta så har en metod för kvantitativ evaulering av kvaliteten hos genererade data set tagits fram. Med hjälp av detta ramverk har två dataset för pupillokalisering genererats med olika generativa modeller. Både väletablerade modeller och en ny modell utvecklad för detta syfte har testats. Den unika modellen visades både kvalitativt och kvantitativt att den genererade de bästa dataseten. Ett antal mindre experiment på standardiserade dataset visade exempel på fall där denna generativa modell kunde förbättra prestandan hos en existerande diskriminativ modell. Resultaten indikerar att generativa modeller kan användas för att augmentera eller ersätta existerande dataset vid träning av diskriminativa modeller. GAN GANs machine learning deep learning generative model generative models deep generative model deep generative models generative adversarial networks VAE VAEs variational autoencoder variational autoencoders autoencoder auto encoder encoder decoder computer vision eye tracking pupil localization pupil eyes eye synthetic data big data data generation synthetic data generation neural networks neural network high-dimensional data high-resolution images. Computer Sciences Datavetenskap (datalogi)
66	TAIGA: uma abordagem para geração de dados de teste por meio de algoritmo genético para programas de processamento de imagens / TAIGA: an Approach to Test Image Generation for Image Processing Programs Using Genetic Algorithm Rodrigues, Davi Silva 24 November 2017 (has links) As atividades de teste de software são de crescente importância devido à maciça presença de sistemas de informação em nosso cotidiano. Programas de Processamento de Imagens (PI) têm um domínio de entrada bastante complexo e, por essa razão, o teste tradicional realizado com esse tipo de programa, conduzido majoritariamente de forma manual, é uma tarefa de alto custo e sujeita a imperfeições. No teste tradicional, em geral, as imagens de entrada são construídas manualmente pelo testador ou selecionadas aleatoriamente de bases de imagens, muitas vezes dificultando a revelação de defeitos no software. A partir de um mapeamento sistemático da literatura realizado, foi identificada uma lacuna no que se refere à geração automatizada de dados de teste no domínio de imagens. Assim, o objetivo desta pesquisa é propor uma abordagem - denominada TAIGA (Test imAge generatIon by Genetic Algorithm) - para a geração de dados de teste para programas de PI por meio de algoritmo genético. Na abordagem proposta, operadores genéticos tradicionais (mutação e crossover) são adaptados para o domínio de imagens e a função fitness é substituída por uma avaliação de resultados provenientes de teste de mutação. A abordagem TAIGA foi validada por meio de experimentos com oito programas de PI distintos, nos quais observaram-se ganhos de até 38,61% em termos de mutation score em comparação ao teste tradicional. Ao automatizar a geração de dados de teste, espera-se conferir maior qualidade ao desenvolvimento de sistemas de PI e contribuir com a diminuição de custos com as atividades de teste de software neste domínio / The massive presence of information systems in our lives has been increasing the importance of software test activities. Image Processing (IP) programs have very complex input domains and, therefore, the traditional testing for this kind of program is a highly costly and vulnerable to errors task. In traditional testing, usually, testers create images by themselves or they execute random selection from images databases, which can make it harder to reveal faults in the software under test. In this context, a systematic mapping study was conducted and a gap was identified concerning the automated test data generation in the images domain. Thus, an approach for generating test data for IP programs by means of genetic algorithms was proposed: TAIGA - Test imAge generatIon by Genetic Algorithm. This approach adapts traditional genetic operators (mutation and crossover) to the images domain and replaces the fitness function by the evaluation of the results of mutation testing. The proposed approach was validated by the execution of experiments involving eight distinct IP programs. TAIGA was able to provide up to 38.61% increase in mutation score when compared to the traditional testing for IP programs. It\'s expected that the automation of test data generation elevates the quality of image processing systems development and reduces the costs of software test activities in the images domain Algoritmos genéticos Evolutionary test Genetic algorithms Geração de dados de teste Geração de imagens de teste Image processing Mutation score Mutation score Mutation testing Processamento de imagens Software test Test data generation Test image generation Teste de mutação Teste de software Teste evolutivo
67	Automatização do teste estrutural de software de veículos autônomos para apoio ao teste de campo / Automated structural software testing of autonomous vehicle to support field testing Neves, Vânia de Oliveira 15 May 2015 (has links) Veículo autônomo inteligente (ou apenas veículo autônomo VA) é um tipo de sistema embarcado que integra componentes físicos (hardware) e computacionais (software). Sua principal característica é a capacidade de locomoção e de operação de modo semi ou completamente autônomo. A autonomia cresce com a capacidade de percepção e de deslocamento no ambiente, robustez e capacidade de resolver e executar tarefas lidando com as mais diversas situações (inteligência). Veículos autônomos representam um tópico de pesquisa importante e que tem impacto direto na sociedade. No entanto, à medida que esse campo avança alguns problemas secundários aparecem como, por exemplo, como saber se esses sistemas foram suficientemente testados. Uma das fases do teste de um VA é o teste de campo, em que o veículo é levado para um ambiente pouco controlado e deve executar livremente a missão para a qual foi programado. Ele é geralmente utilizado para garantir que os veículos autônomos mostrem o comportamento desejado, mas nenhuma informação sobre a estrutura do código é utilizada. Pode ocorrer que o veículo (hardware e software) passou no teste de campo, mas trechos importantes do código nunca tenham sido executados. Durante o teste de campo, os dados de entrada são coletados em logs que podem ser posteriormente analisados para avaliar os resultados do teste e para realizar outros tipos de teste offline. Esta tese apresenta um conjunto de propostas para apoiar a análise do teste de campo do ponto de vista do teste estrutural. A abordagem é composta por um modelo de classes no contexto do teste de campo, uma ferramenta que implementa esse modelo e um algoritmo genético para geração de dados de teste. Apresenta também heurísticas para reduzir o conjunto de dados contidos em um log sem diminuir substancialmente a cobertura obtida e estratégias de combinação e mutação que são usadas no algoritmo. Estudos de caso foram conduzidos para avaliar as heurísticas e estratégias e são também apresentados e discutidos. / Intelligent autonomous vehicle (or just autonomous vehicle - AV) is a type of embedded system that integrates physical (hardware) and computational (software) components. Its main feature is the ability to move and operate partially or fully autonomously. Autonomy grows with the ability to perceive and move within the environment, robustness and ability to solve and perform tasks dealing with different situations (intelligence). Autonomous vehicles represent an important research topic that has a direct impact on society. However, as this field progresses some secondary problems arise, such as how to know if these systems have been sufficiently tested. One of the testing phases of an AV is the field testing, where the vehicle is taken to a controlled environment and it should execute the mission for which it was programed freely. It is generally used to ensure that autonomous vehicles show the intended behavior, but it usually does not take into consideration the code structure. The vehicle (hardware and software) could pass the field testing, but important parts of the code may never have been executed. During the field testing, the input data are collected in logs that can be further analyzed to evaluate the test results and to perform other types of offline tests. This thesis presents a set of proposals to support the analysis of field testing from the point of view of the structural testing. The approach is composed of a class model in the context of the field testing, a tool that implements this model and a genetic algorithm to generate test data. It also shows heuristics to reduce the data set contained in a log without reducing substantially the coverage obtained and combination and mutation strategies that are used in the algorithm. Case studies have been conducted to evaluate the heuristics and strategies, and are also presented and discussed. Autonomous vehicles Geração de dados de teste Search-based testing Structural software testing Test data generation Teste baseado em busca Teste de veículos autônomos Teste estrutural de software Testing of autonomous vehicles Veículos autônomos
68	TAIGA: uma abordagem para geração de dados de teste por meio de algoritmo genético para programas de processamento de imagens / TAIGA: an Approach to Test Image Generation for Image Processing Programs Using Genetic Algorithm Davi Silva Rodrigues 24 November 2017 (has links) As atividades de teste de software são de crescente importância devido à maciça presença de sistemas de informação em nosso cotidiano. Programas de Processamento de Imagens (PI) têm um domínio de entrada bastante complexo e, por essa razão, o teste tradicional realizado com esse tipo de programa, conduzido majoritariamente de forma manual, é uma tarefa de alto custo e sujeita a imperfeições. No teste tradicional, em geral, as imagens de entrada são construídas manualmente pelo testador ou selecionadas aleatoriamente de bases de imagens, muitas vezes dificultando a revelação de defeitos no software. A partir de um mapeamento sistemático da literatura realizado, foi identificada uma lacuna no que se refere à geração automatizada de dados de teste no domínio de imagens. Assim, o objetivo desta pesquisa é propor uma abordagem - denominada TAIGA (Test imAge generatIon by Genetic Algorithm) - para a geração de dados de teste para programas de PI por meio de algoritmo genético. Na abordagem proposta, operadores genéticos tradicionais (mutação e crossover) são adaptados para o domínio de imagens e a função fitness é substituída por uma avaliação de resultados provenientes de teste de mutação. A abordagem TAIGA foi validada por meio de experimentos com oito programas de PI distintos, nos quais observaram-se ganhos de até 38,61% em termos de mutation score em comparação ao teste tradicional. Ao automatizar a geração de dados de teste, espera-se conferir maior qualidade ao desenvolvimento de sistemas de PI e contribuir com a diminuição de custos com as atividades de teste de software neste domínio / The massive presence of information systems in our lives has been increasing the importance of software test activities. Image Processing (IP) programs have very complex input domains and, therefore, the traditional testing for this kind of program is a highly costly and vulnerable to errors task. In traditional testing, usually, testers create images by themselves or they execute random selection from images databases, which can make it harder to reveal faults in the software under test. In this context, a systematic mapping study was conducted and a gap was identified concerning the automated test data generation in the images domain. Thus, an approach for generating test data for IP programs by means of genetic algorithms was proposed: TAIGA - Test imAge generatIon by Genetic Algorithm. This approach adapts traditional genetic operators (mutation and crossover) to the images domain and replaces the fitness function by the evaluation of the results of mutation testing. The proposed approach was validated by the execution of experiments involving eight distinct IP programs. TAIGA was able to provide up to 38.61% increase in mutation score when compared to the traditional testing for IP programs. It\'s expected that the automation of test data generation elevates the quality of image processing systems development and reduces the costs of software test activities in the images domain Algoritmos genéticos Geração de dados de teste Geração de imagens de teste Mutation score Processamento de imagens Teste de mutação Teste de software Teste evolutivo Evolutionary test Genetic algorithms Image processing Mutation score Mutation testing Software test Test data generation Test image generation
69	Evaluating the error of measurement due to categorical scaling with a measurement invariance approach to confirmatory factor analysis Olson, Brent 05 1900 (has links) It has previously been determined that using 3 or 4 points on a categorized response scale will fail to produce a continuous distribution of scores. However, there is no evidence, thus far, revealing the number of scale points that may indeed possess an approximate or sufficiently continuous distribution. This study provides the evidence to suggest the level of categorization in discrete scales that makes them directly comparable to continuous scales in terms of their measurement properties. To do this, we first introduced a novel procedure for simulating discretely scaled data that was both informed and validated through the principles of the Classical True Score Model. Second, we employed a measurement invariance (MI) approach to confirmatory factor analysis (CFA) in order to directly compare the measurement quality of continuously scaled factor models to that of discretely scaled models. The simulated design conditions of the study varied with respect to item-specific variance (low, moderate, high), random error variance (none, moderate, high), and discrete scale categorization (number of scale points ranged from 3 to 101). A population analogue approach was taken with respect to sample size (N = 10,000). We concluded that there are conditions under which response scales with 11 to 15 scale points can reproduce the measurement properties of a continuous scale. Using response scales with more than 15 points may be, for the most part, unnecessary. Scales having from 3 to 10 points introduce a significant level of measurement error, and caution should be taken when employing such scales. The implications of this research and future directions are discussed. optimum number of scale points continuous scale discrete scale categorization coarseness measurement error Classical True Score Model simulation study data generation item specific variance random error variance longitudinal measurement invariance Comparative Fit Index Relative Multivariate Kurtosis
70	Informationsentropische, spektrale und statistische Untersuchungen fahrzeuggenerierter Verkehrsdaten unter besonderer Berücksichtigung der Auswertung und Dimensionierung von FCD-Systemen / Entropical, Spectral and Statistical Analysis of Vehicle Generated Traffic Data with Special Consideration of the Evaluation and Dimension of FCD-Systems Gössel, Frank 18 April 2005 (has links) (PDF) Untersuchungsgegenstand der vorliegenden Arbeit ist die Schnittstelle zwischen Verkehrsprozess und Informationsprozess in Systemen für die fahrzeuggenerierte Verkehrsdatengewinnung. Dabei konzentrieren sich die Untersuchungen auf die originäre Größe Geschwindigkeit. Das wesentliche Ziel der theoretischen und praktischen Untersuchungen bildet die qualifizierte Bestimmung makroskopischer Kenngrößen des Verkehrsflusses aus mikroskopischen Einzelfahrzeugdaten. Einen Schwerpunkt der Arbeit bildet die Analyse von mikroskopischen Einzelfahrzeugdaten mit Hilfe von informationsentropischen und spektralen Betrachtungen. Diese Untersuchungen erfolgen mit dem Ziel, eine optimale Nutzung der limitierten Übertragungs- und Verarbeitungskapazität in realen FCD-Systemen zu ermöglichen, theoretische Grenzerte abzuleiten und in der Praxis verwendete Parameter von FCD-Systemen theoretisch zu begründen. Ausgehend von empirischen und theoretischen Untersuchungen wird die Entropie der Informationsquelle &quot;Geschwindigkeitsganglinie&quot; bestimmt. Es wird gezeigt, dass Geschwindigkeitsganglinien als Markov-Quellen modelliert werden können. Aus der Entropiedynamik von Geschwindigeitsganglinien wird eine optimale Größe für den Erfassungstakt abgeleitet. Eine Analyse der spektralen Eigenschaften von Geschwindigkeitsverläufen zeigt, dass zwischen den Spektren von Geschwindigkeitsganglinien und dem Verkehrszustand Zusammenhänge bestehen. Geschwindigkeitsganglinien besitzen Tiefpasscharakter. Für die Berechnung der Tiefpassgrenzfrequenzen von empirischen Geschwindigkeitsganglinien wird ein Leistungskriterium eingeführt. Ausgehend von den derart bestimmten empirischen Tiefpassgrenzfrequenzen kann ein optimaler Erfassungstakt ermittelt werden, dessen Größe näherungsweise mit dem aus der Entropiedynamik abgeleiteten Erfassungstakt übereinstimmt. Ein einfacher Indikator für die Dynamik von Geschwindigkeitsverläufen ist der Variationskoeffizient der Einzelfahrzeuggeschwindigkeit. Es wird gezeigt, dass die Gewinnung und Übertragung von Variationskoeffizienten der Einzelfahrzeuggeschwindigkeiten in FCD-Systemen sinnvoll ist. In der Arbeit erfolgt eine theoretische Begründung des erforderlichen Ausrüstungsgrades in FCD-Systemen. Die Beurteilung der Leistungsfähigkeit von FCD-Systemen erfolgt dabei auf der Grundlage einer Konfidenzschätzung für die Zufallsgröße Reisegeschwindigkeit. Das verwendete Verfahren ist geeignet, die Leistungsfähigkeit von FCD-Systemen in unterschiedlichen Szenarien (Stadt-, Landstraßen-, Autobahnverkehr) zu vergleichen. Es wird gezeigt, dass FC-Daten in bestimmten Szenarien (insbesondere Stadtverkehr) zwingend einer Fusion mit anderen Verkehrsdaten bedürfen. Für die statistische Dimensionierung und Auswertung eines FCD-Systems ist der Variationskoeffizient der mittleren Reisegeschwindigkeiten der Fahrzeuge eines Fahrzeugkollektivs (kollektiver Variationskoeffizient) ein wesentlicher Parameter. Es wird gezeigt, dass der kollektive Variationskoeffizient in der Regel nicht nur vom Verkehrszustand, sondern auch von der räumlichen und zeitlichen Strukturierung des Beobachtungsgebietes abhängig ist. Für die näherungsweise Bestimmung des kollektiven Variationskoeffizienten werden Modelle abgeleitet und verifiziert. FCD Fahrzeuggenerierte Verkehrsdaten Floating Car Data Verkehrsdatengewinnung Verkehrsmonitoring FCD Floating Car Data ITS traffic data generation traffic monitoring vehicle generated traffic data ddc:620 rvk:ZO 4600 Straßenverkehr Verkehrsablauf Verkehrsinformation

Search results