• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 46
  • 14
  • 4
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 84
  • 84
  • 34
  • 33
  • 30
  • 20
  • 15
  • 15
  • 14
  • 13
  • 13
  • 12
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Generation of Software Test Data from the Design Specification Using Heuristic Techniques. Exploring the UML State Machine Diagrams and GA Based Heuristic Techniques in the Automated Generation of Software Test Data and Test Code.

Doungsa-ard, Chartchai January 2011 (has links)
Software testing is a tedious and very expensive undertaking. Automatic test data generation is, therefore, proposed in this research to help testers reduce their work as well as ascertain software quality. The concept of test driven development (TDD) has become increasingly popular during the past several years. According to TDD, test data should be prepared before the beginning of code implementation. Therefore, this research asserts that the test data should be generated from the software design documents which are normally created prior to software code implementation. Among such design documents, the UML state machine diagrams are selected as a platform for the proposed automated test data generation mechanism. Such diagrams are selected because they show behaviours of a single object in the system. The genetic algorithm (GA) based approach has been developed and applied in the process of searching for the right amount of quality test data. Finally, the generated test data have been used together with UML class diagrams for JUnit test code generation. The GA-based test data generation methods have been enhanced to take care of parallel path and loop problems of the UML state machines. In addition the proposed GA-based approach is also targeted to solve the diagrams with parameterised triggers. As a result, the proposed framework generates test data from the basic state machine diagram and the basic class diagram without any additional nonstandard information, while most other approaches require additional information or the generation of test data from other formal languages. The transition coverage values for the introduced approach here are also high; therefore, the generated test data can cover most of the behaviour of the system. / EU Asia-Link project TH/Asia Link/004(91712) East-West and CAMT
42

Synthetic data generation for domain adaptation of a retriever-reader Question Answering system for the Telecom domain : Comparing dense embeddings with BM25 for Open Domain Question Answering / Syntetisk data genering för domänadaptering av ett retriever-readerbaserat frågebesvaringssystem för telekomdomänen : En jämförelse av dense embeddings med BM25 för Öpen Domän frågebesvaring

Döringer Kana, Filip January 2023 (has links)
Having computer systems capable of answering questions has been a goal within Natural Language Processing research for many years. Machine Learning systems have recently become increasingly proficient at this task with large language models obtaining state-of-the-art performance. Retriever-reader architectures have become a powerful approach for building systems that enable users to enter questions and get factual answers from a corpus of documents. This architecture uses a retriever component that fetches the most relevant documents and a reader which in turn extracts the answer from the documents. These systems commonly use transformer-based models for both components, which have been fine-tuned on a general domain of documents, such as Wikipedia. However, the performance of such systems on new domains, with different vocabularies, can be lacking. Furthermore, new domains of, for instance, company-specific documents often lack annotated data which makes training new models cumbersome. This thesis investigated how a retriever-reader-based architecture can be adapted to a corpus of Telecom documents by generating question-answer data using a large generative language model, GPT3.5. Also, it compared the usage of a dense retriever using BERT to a BM25-based retriever on the domain. Findings suggest that generating training data can be an effective approach for fine-tuning a dense retriever, increasing the Top-K retrieval accuracy by 20 points for k = 10, compared to a dense retriever fine-tuned on Wikipedia. Additionally, it is found that the sparse retriever outperforms the best dense retriever, although, there is reason to believe that the structure of the test dataset could influence this. Finally, the results also indicate that the performance of the reader is not improved by the generated data although future work is needed to draw better conclusions. / Datorsystem som kan svara på frågor har varit ett mål inom forskningsfältet naturlig språkbehandling i många år. System som använder sig av maskininlärning, så som stora språkmodeller har under de senaste åren uppnått hög prestanda. Att använda sig av en så kallad retriever-reader arkitektur har blivit ett kraftfullt tillvägagångssätt för att bygga system som gör det möjligt för användare att ställa frågor och få faktabaserade svar hämtade från en korpus av dokument. Denna arkitektur använder en retriever som hämtar den mest relevanta informationen och en reader som sedan extraherar ett svar från den hämtade informationen. Dessa system använder vanligtvis transformer-baserade modeller för båda komponenterna, som har tränats på en allmän domän som t.ex., Wikipedia. Dock kan prestandan hos dessa system vara bristfällig när de appliceras på mer specifika domäner med andra ordförråd. Dessutom saknas ofta annoterad data för mer specifika domäner, som exempelvis företagsdokument, vilket gör det svårt att träna modeller på dessa områden. I denna avhandling undersöktes hur en retriever-reader arkitektur kan appliceras på en korpus telekomdokument genom att generera data bestående av frågor och tillhörande svar, genom att använda en stor generativ språkmodell, GPT3.5. Rapporten jämförde även användandet av en BERT-baserad retriever med en BM25-baserad retriever för denna domän. Resultaten tyder på att generering av träningsdata kan vara ett effektivt tillvägagångssätt för att träna en BERT-baserad retriever. Den tränade modellen hade 20 poäng högre noggranhet för måttet Top-K retrieval vid k = 10 jämfört med samma model tränad på data från Wikipedia. Resultaten visade även att en BM25-baserad retriever hade högre noggranhet än den bästa BERT-baserade retrievern som tränats. Dock kan detta bero på datasetets utformning. Slutligen visade resultaten även att prestandan hos en tränad reader inte blev bättre genom att träna på genererad data men denna slutsats kräver framtida arbete för att undersökas mer noggrant.
43

Generation of software test data from the design specification using heuristic techniques : exploring the UML state machine diagrams and GA based heuristic techniques in the automated generation of software test data and test code

Doungsa-ard, Chartchai January 2011 (has links)
Software testing is a tedious and very expensive undertaking. Automatic test data generation is, therefore, proposed in this research to help testers reduce their work as well as ascertain software quality. The concept of test driven development (TDD) has become increasingly popular during the past several years. According to TDD, test data should be prepared before the beginning of code implementation. Therefore, this research asserts that the test data should be generated from the software design documents which are normally created prior to software code implementation. Among such design documents, the UML state machine diagrams are selected as a platform for the proposed automated test data generation mechanism. Such diagrams are selected because they show behaviours of a single object in the system. The genetic algorithm (GA) based approach has been developed and applied in the process of searching for the right amount of quality test data. Finally, the generated test data have been used together with UML class diagrams for JUnit test code generation. The GA-based test data generation methods have been enhanced to take care of parallel path and loop problems of the UML state machines. In addition the proposed GA-based approach is also targeted to solve the diagrams with parameterised triggers. As a result, the proposed framework generates test data from the basic state machine diagram and the basic class diagram without any additional nonstandard information, while most other approaches require additional information or the generation of test data from other formal languages. The transition coverage values for the introduced approach here are also high; therefore, the generated test data can cover most of the behaviour of the system.
44

Development of artificial intelligence-based in-silico toxicity models : data quality analysis and model performance enhancement through data generation

Malazizi, Ladan January 2008 (has links)
Toxic compounds, such as pesticides, are routinely tested against a range of aquatic, avian and mammalian species as part of the registration process. The need for reducing dependence on animal testing has led to an increasing interest in alternative methods such as in silico modelling. The QSAR (Quantitative Structure Activity Relationship)-based models are already in use for predicting physicochemical properties, environmental fate, eco-toxicological effects, and specific biological endpoints for a wide range of chemicals. Data plays an important role in modelling QSARs and also in result analysis for toxicity testing processes. This research addresses number of issues in predictive toxicology. One issue is the problem of data quality. Although large amount of toxicity data is available from online sources, this data may contain some unreliable samples and may be defined as of low quality. Its presentation also might not be consistent throughout different sources and that makes the access, interpretation and comparison of the information difficult. To address this issue we started with detailed investigation and experimental work on DEMETRA data. The DEMETRA datasets have been produced by the EC-funded project DEMETRA. Based on the investigation, experiments and the results obtained, the author identified a number of data quality criteria in order to provide a solution for data evaluation in toxicology domain. An algorithm has also been proposed to assess data quality before modelling. Another issue considered in the thesis was the missing values in datasets for toxicology domain. Least Square Method for a paired dataset and Serial Correlation for single version dataset provided the solution for the problem in two different situations. A procedural algorithm using these two methods has been proposed in order to overcome the problem of missing values. Another issue we paid attention to in this thesis was modelling of multi-class data sets in which the severe imbalance class samples distribution exists. The imbalanced data affect the performance of classifiers during the classification process. We have shown that as long as we understand how class members are constructed in dimensional space in each cluster we can reform the distribution and provide more knowledge domain for the classifier.
45

Uma abordagem para geração de dados de teste para o teste de mutação utilizando técnicas baseadas em busca / An approach for test data generation in mutation testing using seacrh-based techniques

Souza, Francisco Carlos Monteiro 24 May 2017 (has links)
O teste de mutação é um critério de teste poderoso para detectar falhas e medir a eficácia de um conjunto de dados de teste. No entanto, é uma técnica de teste computacionalmente cara. O alto custo provém principalmente do esforço para gerar dados de teste adequados para matar os mutantes e pela existência de mutantes equivalentes. Nesse contexto, o objetivo desta tese é apresentar uma abordagem chamada de Reach, Infect and Propagation to Mutation Testing (RIPMuT) que visa gerar dados de teste e sugerir mutantes equivalentes. A abordagem é composta por dois módulos: (i) uma geração automatizada de dados de teste usando subida da encosta e um esquema de fitness de acordo com as condições de alcançabilidade, infeção e propagação (RIP); e (ii) um método para sugerir mutantes equivalentes com base na análise das condições RIP durante o processo de geração de dados de teste. Os experimentos foram conduzidos para avaliar a eficácia da abordagem RIP-MuT e um estudo comparativo com o algoritmo genético e testes aleatórios foi realizado. A abordagem RIP-MuT obteve um escore médio de mutação de 18,25 % maior que o AG e 35,93 % maior que o teste aleatório. O método proposto para detecção de mutantes equivalentes se mostrou viável para redução de custos relacionado a essa atividade, uma vez que obteve uma precisão de 75,05% na sugestão dos mutantes equivalentes. Portanto, os resultados indicam que a abordagem gera dados de teste adequados capazes de matar a maioria dos mutantes em programas C e, também auxilia a identificar mutantes equivalentes corretamente. / Mutation Testing is a powerful test criterion to detect faults and measure the effectiveness of a test data set. However, it is a computationally expensive testing technique. The high cost comes mainly from the effort to generate adequate test data to kill the mutants and by the existence of equivalent mutants. In this thesis, an approach called Reach, Infect and Propagation to Mutation Testing (RIP-MuT) is presented to generate test data and to suggest equivalent mutants. The approach is composed of two modules: (i) an automated test data generation using hill climbing and a fitness scheme according to Reach, Infect, and Propagate (RIP) conditions; and (ii) a method to suggest equivalent mutants based on the analyses of RIP conditions during the process of test data generation. The experiments were conducted to evaluate the effectiveness of the RIP-MuT approach and a comparative study with a genetic algorithm and random testing. The RIP-MuT approach achieved a mean mutation score of 18.25% higher than the GA and 35.93% higher than random testing. The proposed method for detection of equivalent mutants demonstrate to be feasible for cost reduction in this activity since it obtained a precision of 75.05% on suggesting equivalent mutants. Therefore, the results indicate that the approach produces effective test data able to strongly kill the majority of mutants on C programs, and also it can assist in suggesting equivalent mutants correctly.
46

Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos / Selection between whole test generation strategies by analysing object oriented software static metrics

Ramos, Gustavo da Mota 09 October 2018 (has links)
Produtos de software com diferentes complexidades são criados diariamente através da elicitação de demandas complexas e variadas juntamente a prazos restritos. Enquanto estes surgem, altos níveis de qualidade são esperados para tais, ou seja, enquanto os produtos tornam-se mais complexos, o nível de qualidade pode não ser aceitável enquanto o tempo hábil para testes não acompanha a complexidade. Desta maneira, o teste de software e a geração automática de dados de testes surgem com o intuito de entregar produtos contendo altos níveis de qualidade mediante baixos custos e rápidas atividades de teste. Porém, neste contexto, os profissionais de desenvolvimento dependem das estratégias de geração automáticas de testes e principalmente da seleção da técnica mais adequada para conseguir maior cobertura de código possível, este é um fator importante dados que cada técnica de geração de dados de teste possui particularidades e problemas que fazem seu uso melhor em determinados tipos de software. A partir desde cenário, o presente trabalho propõe a seleção da técnica adequada para cada classe de um software com base em suas características, expressas por meio de métricas de softwares orientados a objetos a partir do algoritmo de classificação Naive Bayes. Foi realizada uma revisão bibliográfica de dois algoritmos de geração, algoritmo de busca aleatório e algoritmo de busca genético, compreendendo assim suas vantagens e desvantagens tanto de implementação como de execução. As métricas CK também foram estudadas com o intuito de compreender como estas podem descrever melhor as características de uma classe. O conhecimento adquirido possibilitou coletar os dados de geração de testes de cada classe como cobertura de código e tempo de geração a partir de cada técnica e também as métricas CK, permitindo assim a análise destes dados em conjunto e por fim execução do algoritmo de classificação. Os resultados desta análise demonstraram que um conjunto reduzido e selecionado das métricas CK é mais eficiente e descreve melhor as características de uma classe se comparado ao uso do conjunto por completo. Os resultados apontam também que as métricas CK não influenciam o tempo de geração dos dados de teste, entretanto, as métricas CK demonstraram correlação moderada e influência na seleção do algoritmo genético, participando assim na sua seleção pelo algoritmo Naive Bayes / Software products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes
47

Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos / Selection between whole test generation strategies by analysing object oriented software static metrics

Gustavo da Mota Ramos 09 October 2018 (has links)
Produtos de software com diferentes complexidades são criados diariamente através da elicitação de demandas complexas e variadas juntamente a prazos restritos. Enquanto estes surgem, altos níveis de qualidade são esperados para tais, ou seja, enquanto os produtos tornam-se mais complexos, o nível de qualidade pode não ser aceitável enquanto o tempo hábil para testes não acompanha a complexidade. Desta maneira, o teste de software e a geração automática de dados de testes surgem com o intuito de entregar produtos contendo altos níveis de qualidade mediante baixos custos e rápidas atividades de teste. Porém, neste contexto, os profissionais de desenvolvimento dependem das estratégias de geração automáticas de testes e principalmente da seleção da técnica mais adequada para conseguir maior cobertura de código possível, este é um fator importante dados que cada técnica de geração de dados de teste possui particularidades e problemas que fazem seu uso melhor em determinados tipos de software. A partir desde cenário, o presente trabalho propõe a seleção da técnica adequada para cada classe de um software com base em suas características, expressas por meio de métricas de softwares orientados a objetos a partir do algoritmo de classificação Naive Bayes. Foi realizada uma revisão bibliográfica de dois algoritmos de geração, algoritmo de busca aleatório e algoritmo de busca genético, compreendendo assim suas vantagens e desvantagens tanto de implementação como de execução. As métricas CK também foram estudadas com o intuito de compreender como estas podem descrever melhor as características de uma classe. O conhecimento adquirido possibilitou coletar os dados de geração de testes de cada classe como cobertura de código e tempo de geração a partir de cada técnica e também as métricas CK, permitindo assim a análise destes dados em conjunto e por fim execução do algoritmo de classificação. Os resultados desta análise demonstraram que um conjunto reduzido e selecionado das métricas CK é mais eficiente e descreve melhor as características de uma classe se comparado ao uso do conjunto por completo. Os resultados apontam também que as métricas CK não influenciam o tempo de geração dos dados de teste, entretanto, as métricas CK demonstraram correlação moderada e influência na seleção do algoritmo genético, participando assim na sua seleção pelo algoritmo Naive Bayes / Software products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes
48

Uma abordagem para geração de dados de teste para o teste de mutação utilizando técnicas baseadas em busca / An approach for test data generation in mutation testing using seacrh-based techniques

Francisco Carlos Monteiro Souza 24 May 2017 (has links)
O teste de mutação é um critério de teste poderoso para detectar falhas e medir a eficácia de um conjunto de dados de teste. No entanto, é uma técnica de teste computacionalmente cara. O alto custo provém principalmente do esforço para gerar dados de teste adequados para matar os mutantes e pela existência de mutantes equivalentes. Nesse contexto, o objetivo desta tese é apresentar uma abordagem chamada de Reach, Infect and Propagation to Mutation Testing (RIPMuT) que visa gerar dados de teste e sugerir mutantes equivalentes. A abordagem é composta por dois módulos: (i) uma geração automatizada de dados de teste usando subida da encosta e um esquema de fitness de acordo com as condições de alcançabilidade, infeção e propagação (RIP); e (ii) um método para sugerir mutantes equivalentes com base na análise das condições RIP durante o processo de geração de dados de teste. Os experimentos foram conduzidos para avaliar a eficácia da abordagem RIP-MuT e um estudo comparativo com o algoritmo genético e testes aleatórios foi realizado. A abordagem RIP-MuT obteve um escore médio de mutação de 18,25 % maior que o AG e 35,93 % maior que o teste aleatório. O método proposto para detecção de mutantes equivalentes se mostrou viável para redução de custos relacionado a essa atividade, uma vez que obteve uma precisão de 75,05% na sugestão dos mutantes equivalentes. Portanto, os resultados indicam que a abordagem gera dados de teste adequados capazes de matar a maioria dos mutantes em programas C e, também auxilia a identificar mutantes equivalentes corretamente. / Mutation Testing is a powerful test criterion to detect faults and measure the effectiveness of a test data set. However, it is a computationally expensive testing technique. The high cost comes mainly from the effort to generate adequate test data to kill the mutants and by the existence of equivalent mutants. In this thesis, an approach called Reach, Infect and Propagation to Mutation Testing (RIP-MuT) is presented to generate test data and to suggest equivalent mutants. The approach is composed of two modules: (i) an automated test data generation using hill climbing and a fitness scheme according to Reach, Infect, and Propagate (RIP) conditions; and (ii) a method to suggest equivalent mutants based on the analyses of RIP conditions during the process of test data generation. The experiments were conducted to evaluate the effectiveness of the RIP-MuT approach and a comparative study with a genetic algorithm and random testing. The RIP-MuT approach achieved a mean mutation score of 18.25% higher than the GA and 35.93% higher than random testing. The proposed method for detection of equivalent mutants demonstrate to be feasible for cost reduction in this activity since it obtained a precision of 75.05% on suggesting equivalent mutants. Therefore, the results indicate that the approach produces effective test data able to strongly kill the majority of mutants on C programs, and also it can assist in suggesting equivalent mutants correctly.
49

Generation of Synthetic Data with Generative Adversarial Networks

Garcia Torres, Douglas January 2018 (has links)
The aim of synthetic data generation is to provide data that is not real for cases where the use of real data is somehow limited. For example, when there is a need for larger volumes of data, when the data is sensitive to use, or simply when it is hard to get access to the real data. Traditional methods of synthetic data generation use techniques that do not intend to replicate important statistical properties of the original data. Properties such as the distribution, the patterns or the correlation between variables, are often omitted. Moreover, most of the existing tools and approaches require a great deal of user-defined rules and do not make use of advanced techniques like Machine Learning or Deep Learning. While Machine Learning is an innovative area of Artificial Intelligence and Computer Science that uses statistical techniques to give computers the ability to learn from data, Deep Learning is a closely related field based on learning data representations, which may serve useful for the task of synthetic data generation. This thesis focuses on one of the most interesting and promising innovations of the last years in the Machine Learning community: Generative Adversarial Networks. An approach for generating discrete, continuous or text synthetic data with Generative Adversarial Networks is proposed, tested, evaluated and compared with a baseline approach. The results prove the feasibility and show the advantages and disadvantages of using this framework. Despite its high demand for computational resources, a Generative Adversarial Networks framework is capable of generating quality synthetic data that preserves the statistical properties of a given dataset. / Syftet med syntetisk datagenerering är att tillhandahålla data som inte är verkliga i fall där användningen av reella data på något sätt är begränsad. Till exempel, när det finns behov av större datamängder, när data är känsliga för användning, eller helt enkelt när det är svårt att få tillgång till den verkliga data. Traditionella metoder för syntetiska datagenererande använder tekniker som inte avser att replikera viktiga statistiska egenskaper hos de ursprungliga data. Egenskaper som fördelningen, mönstren eller korrelationen mellan variabler utelämnas ofta. Dessutom kräver de flesta av de befintliga verktygen och metoderna en hel del användardefinierade regler och använder inte avancerade tekniker som Machine Learning eller Deep Learning. Machine Learning är ett innovativt område för artificiell intelligens och datavetenskap som använder statistiska tekniker för att ge datorer möjlighet att lära av data. Deep Learning ett närbesläktat fält baserat på inlärningsdatapresentationer, vilket kan vara användbart för att generera syntetisk data. Denna avhandling fokuserar på en av de mest intressanta och lovande innovationerna från de senaste åren i Machine Learning-samhället: Generative Adversarial Networks. Generative Adversarial Networks är ett tillvägagångssätt för att generera diskret, kontinuerlig eller textsyntetisk data som föreslås, testas, utvärderas och jämförs med en baslinjemetod. Resultaten visar genomförbarheten och visar fördelarna och nackdelarna med att använda denna metod. Trots dess stora efterfrågan på beräkningsresurser kan ett generativt adversarialnätverk skapa generell syntetisk data som bevarar de statistiska egenskaperna hos ett visst dataset.
50

Energy-Efficient Private Forecasting on Health Data using SNNs / Energieffektiv privat prognos om hälsodata med hjälp av SNNs

Di Matteo, Davide January 2022 (has links)
Health monitoring devices, such as Fitbit, are gaining popularity both as wellness tools and as a source of information for healthcare decisions. Predicting such wellness goals accurately is critical for the users to make informed lifestyle choices. The core objective of this thesis is to design and implement such a system that takes energy consumption and privacy into account. This research is modelled as a time-series forecasting problem that makes use of Spiking Neural Networks (SNNs) due to their proven energy-saving capabilities. Thanks to their design that closely mimics natural neural networks (such as the brain), SNNs have the potential to significantly outperform classic Artificial Neural Networks in terms of energy consumption and robustness. In order to prove our hypotheses, a previous research by Sonia et al. [1] in the same domain and with the same dataset is used as our starting point, where a private forecasting system using Long short-term memory (LSTM) is designed and implemented. Their study also implements and evaluates a clustering federated learning approach, which fits well the highly distributed data. The results obtained in their research act as a baseline to compare our results in terms of accuracy, training time, model size and estimated energy consumed. Our experiments show that Spiking Neural Networks trades off accuracy (2.19x, 1.19x, 4.13x, 1.16x greater Root Mean Square Error (RMSE) for macronutrients, calories burned, resting heart rate, and active minutes respectively), to grant a smaller model (19% less parameters an 77% lighter in memory) and a 43% faster training. Our model is estimated to consume 3.36μJ per inference, which is much lighter than traditional Artificial Neural Networks (ANNs) [2]. The data recorded by health monitoring devices is vastly distributed in the real-world. Moreover, with such sensitive recorded information, there are many possible implications to consider. For these reasons, we apply the clustering federated learning implementation [1] to our use-case. However, it can be challenging to adopt such techniques since it can be difficult to learn from data sequences that are non-regular. We use a two-step streaming clustering approach to classify customers based on their eating and exercise habits. It has been shown that training different models for each group of users is useful, particularly in terms of training time; however this is strongly dependent on the cluster size. Our experiments conclude that there is a decrease in error and training time if the clusters contain enough data to train the models. Finally, this study addresses the issue of data privacy by using state of-the-art differential privacy. We apply e-differential privacy to both our baseline model (trained on the whole dataset) and our federated learning based approach. With a differential privacy of ∈= 0.1 our experiments report an increase in the measured average error (RMSE) of only 25%. Specifically, +23.13%, 25.71%, +29.87%, 21.57% for macronutrients (grams), calories burned (kCal), resting heart rate (beats per minute (bpm), and minutes (minutes) respectively. / Hälsoövervakningsenheter, som Fitbit, blir allt populärare både som friskvårdsverktyg och som informationskälla för vårdbeslut. Att förutsäga sådana välbefinnandemål korrekt är avgörande för att användarna ska kunna göra välgrundade livsstilsval. Kärnmålet med denna avhandling är att designa och implementera ett sådant system som tar hänsyn till energiförbrukning och integritet. Denna forskning är modellerad som ett tidsserieprognosproblem som använder sig av SNNs på grund av deras bevisade energibesparingsförmåga. Tack vare deras design som nära efterliknar naturliga neurala nätverk (som hjärnan) har SNNs potentialen att avsevärt överträffa klassiska artificiella neurala nätverk när det gäller energiförbrukning och robusthet. För att bevisa våra hypoteser har en tidigare forskning av Sonia et al. [1] i samma domän och med samma dataset används som utgångspunkt, där ett privat prognossystem som använder LSTM designas och implementeras. Deras studie implementerar och utvärderar också en klustringsstrategi för federerad inlärning, som passar väl in på den mycket distribuerade data. Resultaten som erhållits i deras forskning fungerar som en baslinje för att jämföra våra resultat vad gäller noggrannhet, träningstid, modellstorlek och uppskattad energiförbrukning. Våra experiment visar att Spiking Neural Networks byter ut precision (2,19x, 1,19x, 4,13x, 1,16x större RMSE för makronäringsämnen, förbrända kalorier, vilopuls respektive aktiva minuter), för att ge en mindre modell ( 19% mindre parametrar, 77% lättare i minnet) och 43% snabbare träning. Vår modell beräknas förbruka 3, 36μJ, vilket är mycket lättare än traditionella ANNs [2]. Data som registreras av hälsoövervakningsenheter är enormt spridda i den verkliga världen. Dessutom, med sådan känslig registrerad information finns det många möjliga konsekvenser att överväga. Av dessa skäl tillämpar vi klustringsimplementeringen för federerad inlärning [1] på vårt användningsfall. Det kan dock vara utmanande att använda sådana tekniker eftersom det kan vara svårt att lära sig av datasekvenser som är oregelbundna. Vi använder en tvåstegs streaming-klustringsmetod för att klassificera kunder baserat på deras mat- och träningsvanor. Det har visat sig att det är användbart att träna olika modeller för varje grupp av användare, särskilt när det gäller utbildningstid; detta är dock starkt beroende av klustrets storlek. Våra experiment drar slutsatsen att det finns en minskning av fel och träningstid om klustren innehåller tillräckligt med data för att träna modellerna. Slutligen tar denna studie upp frågan om datasekretess genom att använda den senaste differentiell integritet. Vi tillämpar e-differentiell integritet på både vår baslinjemodell (utbildad på hela datasetet) och vår federerade inlärningsbaserade metod. Med en differentiell integritet på ∈= 0.1 rapporterar våra experiment en ökning av det uppmätta medelfelet (RMSE) på endast 25%. Specifikt +23,13%, 25,71%, +29,87%, 21,57% för makronäringsämnen (gram), förbrända kalorier (kCal), vilopuls (bpm och minuter (minuter).

Page generated in 1.3848 seconds