Global ETD Search

21	Influence of Automatically Constructed Non-Equivalent Mutants on Predictions of Metamorphic Relations Götborg, Johan January 2023 (has links) Behovet av tillförlitliga, motståndskraftiga, och beständiga system är uppenbart i vårt samhälle, som i ökande grad blir allt mer beroende av mjukvarulösningar. För att uppnå tillfredsställande nivåer av säkerhet och robusthet måste alla system kontinuerligt genomgå tester. En av de största utmaningarna vid automatisering av programvarutestning är avsaknaden av tillförlitliga orakel kapabla att ge korrekta bedömningar av testfall. Metamorfisk testning är en metod som har visats möjlig att applicera för automatisering av testning, men som däremot kräver identifiering av metamorfiska relationer. Det har gjorts försök att identifiera metamorfiska relationer med hjälp av vissa maskininlärningsmodellers förmåga till mönsterigenkänning. Ett stort problem för sådana tillvägagångssätt är mängden tillgängliga och användbara data som dessa ML-modeller kan tränas på. Det huvudsakliga bidraget denna uppsats levererar är en automatiserad metod för att genomföra utökning av data genom källkodsmutation i syfte att skala befintliga datamängder. Specifikt behandlar denna uppsats producering av icke-ekvivalenta mutanter och deras inverkan på maskininlärningsassisterad identifiering av metamorfiska relationer. Resultaten visar att icke-ekvivalenta mutanter kan genereras effektivt, även om manuell granskning är nödvändig för att härleda korrekta etiketter för varje datapunkt. Det visas också att icke-ekvivalenta mutanter kan påverka klassificeringsprestandan positivt i vissa fall, även om resultaten varierar beroende på mutationsoperator och behandlad metamorfisk relation. Framgångsrika framsteg inom testautomatisering kan potentiellt påverka nuvarande standarder för programvaruutveckling genom att förbättra programvarutestningspraxis. Därmed bidrar denna studie till diskussionen om hur automatiserad programvarutestning kan påverka organisationens prestationsförmåga i ett bredare perspektiv. Diskussionen baseras på ramverket för balanserade styrkort, och slutsatsen visar att testautomatisering kan generera fördelaktiga resultat på flera fronter. Det är dock viktigt att samordna sådana initiativ med organisationens strategiska inriktning och långsiktiga mål. / The need for reliable, resilient, and persistent systems is evident in our society, which is becoming increasingly more dependent on software solutions. In order to achieve satisfactory levels of security and robustness, all systems continuously need to undergo testing to detect faults and unwanted functionality. One of the mayor issues in automating software testing is the lack of reliable oracles capable of deriving test case verdicts. Metamorphic testing has been identified as a testing technique which can be used for test automation, though it requires the identification of metamorphic relations. There have been attempts at identifying metamorphic relations using the pattern recognition capabilities of certain machine learning models. A significant problem for any such approach is obtaining a sufficiently large labeled dataset which the ML models can be trained on. The main contribution of this paper is an automated approach to performing data augmentation through a process of source code mutation with the aim of scaling existing datasets. Specifically, this paper considers the generation of non-equivalent mutants and their impact on machine learning assisted identification of metamorphic relations. The results show that non-equivalent mutants can be efficiently generated, although manual oversight is necessary to derive accurate labels for each sample. It is also shown that non-equivalent mutants can positively impact the classification performance in certain instances, though results vary depending mutation operator and considered metamorphic relation. Furthermore, successful advances in the area of test automation can potentially affect current software development standards by improving software testing practices. As such, this study adds to the discussion on how automated software testing might affect organizational performance. The discussion is based on the balanced scorecard framework, and the discussion concludes that test automation can generate beneficial performance outcomes. However, it is imperative to aligning such endeavours with the strategic direction and long-term objectives of the organization. Balances scorecard Data augmentation Machine learning Metamorphic testing MuJava Mutation testing Computer and Information Sciences Data- och informationsvetenskap
22	Low-Resource Natural Language Understanding in Task-Oriented Dialogue Louvan, Samuel 11 March 2022 (has links) Task-oriented dialogue (ToD) systems need to interpret the user's input to understand the user's needs (intent) and corresponding relevant information (slots). This process is performed by a Natural Language Understanding (NLU) component, which maps the text utterance into a semantic frame representation, involving two subtasks: intent classification (text classification) and slot filling (sequence tagging). Typically, new domains and languages are regularly added to the system to support more functionalities. Collecting domain-specific data and performing fine-grained annotation of large amounts of data every time a new domain and language is introduced can be expensive. Thus, developing an NLU model that generalizes well across domains and languages with less labeled data (low-resource) is crucial and remains challenging. This thesis focuses on investigating transfer learning and data augmentation methods for low-resource NLU in ToD. Our first contribution is a study of the potential of non-conversational text as a source for transfer. Most transfer learning approaches assume labeled conversational data as the source task and adapt the NLU model to the target task. We show that leveraging similar tasks from non-conversational text improves performance on target slot filling tasks through multi-task learning in low-resource settings. Second, we propose a set of lightweight augmentation methods that apply data transformation on token and sentence levels through slot value substitution and syntactic manipulation. Despite its simplicity, the performance is comparable to deep learning-based augmentation models, and it is effective on six languages on NLU tasks. Third, we investigate the effectiveness of domain adaptive pre-training for zero-shot cross-lingual NLU. In terms of overall performance, continued pre-training in English is effective across languages. This result indicates that the domain knowledge learned in English is transferable to other languages. In addition to that, domain similarity is essential. We show that intermediate pre-training data that is more similar – in terms of data distribution – to the target dataset yields better performance.
23	Convolutional Neural Networks for Classification of Metastatic Tissue in Lymph Nodes : How Does Cutout Affect the Performance of Convolutional Neural Networks for Biomedical Image Classification? / Convolutional Neural Networks för att klassificera förekomsten av metastatisk vävnad i lymfkörtlarna Ericsson, Andreas, Döringer Kana, Filip January 2021 (has links) One of every eight women will in their lifetime suffer from breast cancer, making it the most common type of cancer for women. A successful treatment is very much dependent on identifying metastatic tissue which is cancer found beyond the initial tumour. Using deep learning within biomedical analysis has become an effective approach. However, its success is very dependent on large datasets. Data augmentation is a way to enhance datasets without requiring more annotated data. One way of doing this is using the cutout method which masks parts of an input image. Our research focused on investigating how the cutout method could improve the performance of Convolutional Neural Networks for classifying metastatic tissue on the Patch Camelyon dataset. Our research showed that improvements in performance can be achieved by using the cutout method. Further, our research suggests that using a non label- preserving version of cutout is better than a label- preserving version. The most improvement in accuracy was seen when we used a randomly sized cutout mask. The experiment resulted in an increase in accuracy by 3.6%, from the baseline of 82,3% to 85.9%. The cutout method was also compared- and used in conjunction with other well- established data augmentation techniques. Our conclusion is that cutout can be a competitive form of data augmentation that can be used both with and without other data augmentation techniques. / Var åttonde kvinna drabbas under sin livstid av bröstcancer. Detta gör det till den vanligaste formen av cancer för kvinnor. En framgångsrik behandling är beroende av att kunna identifiera metastatisk vävnad, vilket är cancer som spridit sig bortom den ursprungliga tumören. Att använda djupinlärning inom biomedicinsk analys har blivit en effektiv metod. Dock är dess framgång väldigt beroende av stora datamängder. Dataförstärkning är olika sätt att förbättra en mängd data som inte innebär att addera ytterligare annoterad data. Ett sätt att göra detta är genom den en metod som kallas Cutout som maskar en del av en bild. Vår studie undersöker hur Cutout påverkar resultatet när Convolutional Neural Networks klassificerar huruvida bilder från datasetet Patch Camelyon innehåler metastaser eller inte. Vår studie visar att användandet av Cutout kan innebära förbättringar i resultatet. Dessutom tyder vår studie på att resultatet förbättras än mer om även delen av bilden som kan innehålla metastaser kan maskas ut. Den största förbättringen i resultatet var när maskningen var av varierande storlek från bild till bild. Resultatet förbättrades från 82.3% korrekta klassifikationer utan någon dataförstärkning till 85.9% med den bästa versionen av Cutout. Cutout jämfördes också, och användas tillsammans med, andra väletablerade dataförstärkningsmetoder. Vår slutsats är att Cutout är en dataförstärkningsmetod med potentital att vara användbar såväl med som utan andra dataförstärkningsmetoder. Convolutional neural network CNN breast cancer computer aided diagnostics data augmentation Computer Sciences Datavetenskap (datalogi)
24	Bayesian estimation of factor analysis models with incomplete data Merkle, Edgar C. 10 October 2005 (has links) No description available. Bayesian computation Factor analysis Missing data Incomplete data Data augmentation Multiple imputation
25	Bayesian Probit Regression Models for Spatially-Dependent Categorical Data Berrett, Candace 02 November 2010 (has links) No description available. Statistics spatial statistics latent variable methods binary data categorical data data augmentation MCMC classification
26	Enhancing Text Readability Using Deep Learning Techniques Alkaldi, Wejdan 20 July 2022 (has links) In the information era, reading becomes more important to keep up with the growing amount of knowledge. The ability to read a document varies from person to person depending on their skills and knowledge. It also depends on the readability level of the text, whether it matches the reader’s level or not. In this thesis, we propose a system that uses state-of-the-art technology in machine learning and deep learning to classify and simplify a text taking into consideration the reader’s level of reading. The system classifies any text to its equivalent readability level. If the text readability level is higher than the reader’s level, i.e. too difficult to read, the system performs text simplification to meet the desired readability level. The classification and simplification models are trained on data annotated with readability levels from in the Newsela corpus. The trained simplification model performs at sentence level, to simplify a given text to match a specific readability level. Moreover, the trained classification model is used to classify more unlabelled sentences using Wikipedia Corpus and Mechanical Turk Corpus in order to enrich the text simplification dataset. The augmented dataset is then used to improve the quality of the simplified sentences. The system generates simplified versions of a text based on the desired readability levels. This can help people with low literacy to read and understand any documents they need. It can also be beneficial to educators who assist readers with different reading levels. NLP Text Simplification Text Classification Deep Learning Reinforcement Learning Data Augmentation Natural Language Processing
27	Data Augmentation Approaches for Automatic Speech Recognition Using Text-to-Speech / 音声認識のための音声合成を用いたデータ拡張手法 Ueno, Sei 23 March 2022 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24027号 / 情博第783号 / 新制\|\|情\|\|133(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授西野恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Speech Recognition Data Augmentation Domain Adaptation Text-to-Speech Speech-to-Text 007
28	Maintenance Data Augmentation, using Markov Chain Monte Carlo Simulation : (Hamiltonian MCMC using NUTS) Roohani, Muhammad Ammar January 2024 (has links) Reliable and efficient utilization and operation of any engineering asset require carefully designed maintenance planning and maintenance related data in the form of failure times, repair times, Mean Time between Failure (MTBF) and conditioning data etc. play a pivotal role in maintenance decision support. With the advancement in data analytics sciences and industrial artificial intelligence, maintenance related data is being used for maintenance prognostics modeling to predict future maintenance requirements that form the basis of maintenance design and planning in any maintenance-conscious industry like railways. The lack of such available data creates a no. of different types of problems in data driven prognostics modelling. There have been a few methods, the researchers have employed to counter the problems due to lack of available data. The proposed methodology involves data augmentation technique using Markov Chain Monte Carlo (MCMC) Simulation to enhance maintenance data to be used in maintenance prognostics modeling that can serve as basis for better maintenance decision support and planning. Maintenance Engineering Data Augmentation Prognostics Modeling Maintenance Analytics Markov Chain Monte Carlo Simulation Civil Engineering Samhällsbyggnadsteknik
29	Generative Data Augmentation: Using DCGAN To Expand Training Datasets For Chest X-Ray Pneumonia Detection Maier, Ryan D 01 June 2024 (has links) (PDF) Recent advancements in computer vision have demonstrated remarkable success in image classification tasks, particularly when provided with an ample supply of accurately labeled images for training. These techniques have also exhibited significant potential in revolutionizing computer-aided medical diagnosis by enabling the segmentation and classification of medical images, leveraging Convolutional Neural Networks (CNNs) and similar models. However, the integration of such technologies into clinical practice faces notable challenges. Chief among these is the obstacle of acquiring high-quality medical imaging data for training purposes. Patient privacy concerns often hinder researchers from accessing large datasets, while less common medical conditions pose additional hurdles due to scarcity of relevant data. This study aims to address the issue of insufficient data availability in medical imaging analysis. We present experiments employing Deep Convolutional Generative Adversarial Networks (DCGANs) to augment training datasets of chest X-ray images, specifically targeting the identification of pneumonia-affected lungs using CNNs. Our findings demonstrate that DCGAN-based generative data augmentation consistently enhances classification performance, even when training sets are severely limited in size. Machine Learning Deep Learning Image Generation Data Augmentation Medical Imaging Biomedical Other Engineering
30	Methods for data and user efficient annotation for multi-label topic classification / Effektiva annoteringsmetoder för klassificering med multipla klasser Miszkurka, Agnieszka January 2022 (has links) Machine Learning models trained using supervised learning can achieve great results when a sufficient amount of labeled data is used. However, the annotation process is a costly and time-consuming task. There are many methods devised to make the annotation pipeline more user and data efficient. This thesis explores techniques from Active Learning, Zero-shot Learning, Data Augmentation domains as well as pre-annotation with revision in the context of multi-label classification. Active ’Learnings goal is to choose the most informative samples for labeling. As an Active Learning state-of-the-art technique Contrastive Active Learning was adapted to a multi-label case. Once there is some labeled data, we can augment samples to make the dataset more diverse. English-German-English Backtranslation was used to perform Data Augmentation. Zero-shot learning is a setup in which a Machine Learning model can make predictions for classes it was not trained to predict. Zero-shot via Textual Entailment was leveraged in this study and its usefulness for pre-annotation with revision was reported. The results on the Reviews of Electric Vehicle Charging Stations dataset show that it may be beneficial to use Active Learning and Data Augmentation in the annotation pipeline. Active Learning methods such as Contrastive Active Learning can identify samples belonging to the rarest classes while Data Augmentation via Backtranslation can improve performance especially when little training data is available. The results for Zero-shot Learning via Textual Entailment experiments show that this technique is not suitable for the production environment. / Klassificeringsmodeller som tränas med övervakad inlärning kan uppnå goda resultat när en tillräcklig mängd annoterad data används. Annoteringsprocessen är dock en kostsam och tidskrävande uppgift. Det finns många metoder utarbetade för att göra annoteringspipelinen mer användar- och dataeffektiv. Detta examensarbete utforskar tekniker från områdena Active Learning, Zero-shot Learning, Data Augmentation, samt pre-annotering, där annoterarens roll är att verifiera eller revidera en klass föreslagen av systemet. Målet med Active Learning är att välja de mest informativa datapunkterna för annotering. Contrastive Active Learning utökades till fallet där en datapunkt kan tillhöra flera klasser. Om det redan finns några annoterade data kan vi utöka datamängden med artificiella datapunkter, med syfte att göra datasetet mer mångsidigt. Engelsk-Tysk-Engelsk översättning användes för att konstruera sådana artificiella datapunkter. Zero-shot-inlärning är en teknik i vilken en maskininlärningsmodell kan göra förutsägelser för klasser som den inte var tränad att förutsäga. Zero-shot via Textual Entailment utnyttjades i denna studie för att utöka datamängden med artificiella datapunkter. Resultat från datamängden “Reviews of Electric Vehicle Charging ”Stations visar att det kan vara fördelaktigt att använda Active Learning och Data Augmentation i annoteringspipelinen. Active Learning-metoder som Contrastive Active Learning kan identifiera datapunkter som tillhör de mest sällsynta klasserna, medan Data Augmentation via Backtranslation kan förbättra klassificerarens prestanda, särskilt när få träningsdata finns tillgänglig. Resultaten för Zero-shot Learning visar att denna teknik inte är lämplig för en produktionsmiljö. Natural Language Processing Multi-label text classification Active Learning Zero-shot learning Data Augmentation Data-centric AI Naturlig språkbehandling Textklassificering med multipla klasser Active Learning Zero-shot learning Data Augmentation Datacentrerad AI Computer and Information Sciences Data- och informationsvetenskap

Search results