• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 57
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 84
  • 84
  • 37
  • 35
  • 20
  • 20
  • 18
  • 18
  • 18
  • 16
  • 16
  • 14
  • 14
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Influence of Automatically Constructed Non-Equivalent Mutants on Predictions of Metamorphic Relations

Götborg, Johan January 2023 (has links)
Behovet av tillförlitliga, motståndskraftiga, och beständiga system är uppenbart i vårt samhälle, som i ökande grad blir allt mer beroende av mjukvarulösningar. För att uppnå tillfredsställande nivåer av säkerhet och robusthet måste alla system kontinuerligt genomgå tester. En av de största utmaningarna vid automatisering av programvarutestning är avsaknaden av tillförlitliga orakel kapabla att ge korrekta bedömningar av testfall. Metamorfisk testning är en metod som har visats möjlig att applicera för automatisering av testning, men som däremot kräver identifiering av metamorfiska relationer. Det har gjorts försök att identifiera metamorfiska relationer med hjälp av vissa maskininlärningsmodellers förmåga till mönsterigenkänning. Ett stort problem för sådana tillvägagångssätt är mängden tillgängliga och användbara data som dessa ML-modeller kan tränas på. Det huvudsakliga bidraget denna uppsats levererar är en automatiserad metod för att genomföra utökning av data genom källkodsmutation i syfte att skala befintliga datamängder. Specifikt behandlar denna uppsats producering av icke-ekvivalenta mutanter och deras inverkan på maskininlärningsassisterad identifiering av metamorfiska relationer. Resultaten visar att icke-ekvivalenta mutanter kan genereras effektivt, även om manuell granskning är nödvändig för att härleda korrekta etiketter för varje datapunkt. Det visas också att icke-ekvivalenta mutanter kan påverka klassificeringsprestandan positivt i vissa fall, även om resultaten varierar beroende på mutationsoperator och behandlad metamorfisk relation. Framgångsrika framsteg inom testautomatisering kan potentiellt påverka nuvarande standarder för programvaruutveckling genom att förbättra programvarutestningspraxis. Därmed bidrar denna studie till diskussionen om hur automatiserad programvarutestning kan påverka organisationens prestationsförmåga i ett bredare perspektiv. Diskussionen baseras på ramverket för balanserade styrkort, och slutsatsen visar att testautomatisering kan generera fördelaktiga resultat på flera fronter. Det är dock viktigt att samordna sådana initiativ med organisationens strategiska inriktning och långsiktiga mål. / The need for reliable, resilient, and persistent systems is evident in our society, which is becoming increasingly more dependent on software solutions. In order to achieve satisfactory levels of security and robustness, all systems continuously need to undergo testing to detect faults and unwanted functionality. One of the mayor issues in automating software testing is the lack of reliable oracles capable of deriving test case verdicts. Metamorphic testing has been identified as a testing technique which can be used for test automation, though it requires the identification of metamorphic relations. There have been attempts at identifying metamorphic relations using the pattern recognition capabilities of certain machine learning models. A significant problem for any such approach is obtaining a sufficiently large labeled dataset which the ML models can be trained on. The main contribution of this paper is an automated approach to performing data augmentation through a process of source code mutation with the aim of scaling existing datasets. Specifically, this paper considers the generation of non-equivalent mutants and their impact on machine learning assisted identification of metamorphic relations. The results show that non-equivalent mutants can be efficiently generated, although manual oversight is necessary to derive accurate labels for each sample. It is also shown that non-equivalent mutants can positively impact the classification performance in certain instances, though results vary depending mutation operator and considered metamorphic relation. Furthermore, successful advances in the area of test automation can potentially affect current software development standards by improving software testing practices. As such, this study adds to the discussion on how automated software testing might affect organizational performance. The discussion is based on the balanced scorecard framework, and the discussion concludes that test automation can generate beneficial performance outcomes. However, it is imperative to aligning such endeavours with the strategic direction and long-term objectives of the organization.
22

Low-Resource Natural Language Understanding in Task-Oriented Dialogue

Louvan, Samuel 11 March 2022 (has links)
Task-oriented dialogue (ToD) systems need to interpret the user's input to understand the user's needs (intent) and corresponding relevant information (slots). This process is performed by a Natural Language Understanding (NLU) component, which maps the text utterance into a semantic frame representation, involving two subtasks: intent classification (text classification) and slot filling (sequence tagging). Typically, new domains and languages are regularly added to the system to support more functionalities. Collecting domain-specific data and performing fine-grained annotation of large amounts of data every time a new domain and language is introduced can be expensive. Thus, developing an NLU model that generalizes well across domains and languages with less labeled data (low-resource) is crucial and remains challenging. This thesis focuses on investigating transfer learning and data augmentation methods for low-resource NLU in ToD. Our first contribution is a study of the potential of non-conversational text as a source for transfer. Most transfer learning approaches assume labeled conversational data as the source task and adapt the NLU model to the target task. We show that leveraging similar tasks from non-conversational text improves performance on target slot filling tasks through multi-task learning in low-resource settings. Second, we propose a set of lightweight augmentation methods that apply data transformation on token and sentence levels through slot value substitution and syntactic manipulation. Despite its simplicity, the performance is comparable to deep learning-based augmentation models, and it is effective on six languages on NLU tasks. Third, we investigate the effectiveness of domain adaptive pre-training for zero-shot cross-lingual NLU. In terms of overall performance, continued pre-training in English is effective across languages. This result indicates that the domain knowledge learned in English is transferable to other languages. In addition to that, domain similarity is essential. We show that intermediate pre-training data that is more similar – in terms of data distribution – to the target dataset yields better performance.
23

Convolutional Neural Networks for Classification of Metastatic Tissue in Lymph Nodes : How Does Cutout Affect the Performance of Convolutional Neural Networks for Biomedical Image Classification? / Convolutional Neural Networks för att klassificera förekomsten av metastatisk vävnad i lymfkörtlarna

Ericsson, Andreas, Döringer Kana, Filip January 2021 (has links)
One of every eight women will in their lifetime suffer from breast cancer, making it the most common type of cancer for women. A successful treatment is very much dependent on identifying metastatic tissue which is cancer found beyond the initial tumour. Using deep learning within biomedical analysis has become an effective approach. However, its success is very dependent on large datasets. Data augmentation is a way to enhance datasets without requiring more annotated data. One way of doing this is using the cutout method which masks parts of an input image. Our research focused on investigating how the cutout method could improve the performance of Convolutional Neural Networks for classifying metastatic tissue on the Patch Camelyon dataset. Our research showed that improvements in performance can be achieved by using the cutout method. Further, our research suggests that using a non label- preserving version of cutout is better than a label- preserving version. The most improvement in accuracy was seen when we used a randomly sized cutout mask. The experiment resulted in an increase in accuracy by 3.6%, from the baseline of 82,3% to 85.9%. The cutout method was also compared- and used in conjunction with other well- established data augmentation techniques. Our conclusion is that cutout can be a competitive form of data augmentation that can be used both with and without other data augmentation techniques. / Var åttonde kvinna drabbas under sin livstid av bröstcancer. Detta gör det till den vanligaste formen av cancer för kvinnor. En framgångsrik behandling är beroende av att kunna identifiera metastatisk vävnad, vilket är cancer som spridit sig bortom den ursprungliga tumören. Att använda djupinlärning inom biomedicinsk analys har blivit en effektiv metod. Dock är dess framgång väldigt beroende av stora datamängder. Dataförstärkning är olika sätt att förbättra en mängd data som inte innebär att addera ytterligare annoterad data. Ett sätt att göra detta är genom den en metod som kallas Cutout som maskar en del av en bild. Vår studie undersöker hur Cutout påverkar resultatet när Convolutional Neural Networks klassificerar huruvida bilder från datasetet Patch Camelyon innehåler metastaser eller inte. Vår studie visar att användandet av Cutout kan innebära förbättringar i resultatet. Dessutom tyder vår studie på att resultatet förbättras än mer om även delen av bilden som kan innehålla metastaser kan maskas ut. Den största förbättringen i resultatet var när maskningen var av varierande storlek från bild till bild. Resultatet förbättrades från 82.3% korrekta klassifikationer utan någon dataförstärkning till 85.9% med den bästa versionen av Cutout. Cutout jämfördes också, och användas tillsammans med, andra väletablerade dataförstärkningsmetoder. Vår slutsats är att Cutout är en dataförstärkningsmetod med potentital att vara användbar såväl med som utan andra dataförstärkningsmetoder.
24

Bayesian estimation of factor analysis models with incomplete data

Merkle, Edgar C. 10 October 2005 (has links)
No description available.
25

Bayesian Probit Regression Models for Spatially-Dependent Categorical Data

Berrett, Candace 02 November 2010 (has links)
No description available.
26

Enhancing Text Readability Using Deep Learning Techniques

Alkaldi, Wejdan 20 July 2022 (has links)
In the information era, reading becomes more important to keep up with the growing amount of knowledge. The ability to read a document varies from person to person depending on their skills and knowledge. It also depends on the readability level of the text, whether it matches the reader’s level or not. In this thesis, we propose a system that uses state-of-the-art technology in machine learning and deep learning to classify and simplify a text taking into consideration the reader’s level of reading. The system classifies any text to its equivalent readability level. If the text readability level is higher than the reader’s level, i.e. too difficult to read, the system performs text simplification to meet the desired readability level. The classification and simplification models are trained on data annotated with readability levels from in the Newsela corpus. The trained simplification model performs at sentence level, to simplify a given text to match a specific readability level. Moreover, the trained classification model is used to classify more unlabelled sentences using Wikipedia Corpus and Mechanical Turk Corpus in order to enrich the text simplification dataset. The augmented dataset is then used to improve the quality of the simplified sentences. The system generates simplified versions of a text based on the desired readability levels. This can help people with low literacy to read and understand any documents they need. It can also be beneficial to educators who assist readers with different reading levels.
27

Data Augmentation Approaches for Automatic Speech Recognition Using Text-to-Speech / 音声認識のための音声合成を用いたデータ拡張手法

Ueno, Sei 23 March 2022 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24027号 / 情博第783号 / 新制||情||133(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 黒橋 禎夫, 教授 西野 恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
28

Maintenance Data Augmentation, using Markov Chain Monte Carlo Simulation : (Hamiltonian MCMC using NUTS)

Roohani, Muhammad Ammar January 2024 (has links)
Reliable and efficient utilization and operation of any engineering asset require carefully designed maintenance planning and maintenance related data in the form of failure times, repair times, Mean Time between Failure (MTBF) and conditioning data etc. play a pivotal role in maintenance decision support. With the advancement in data analytics sciences and industrial artificial intelligence, maintenance related data is being used for maintenance prognostics modeling to predict future maintenance requirements that form the basis of maintenance design and planning in any maintenance-conscious industry like railways. The lack of such available data creates a no. of different types of problems in data driven prognostics modelling. There have been a few methods, the researchers have employed to counter the problems due to lack of available data. The proposed methodology involves data augmentation technique using Markov Chain Monte Carlo (MCMC) Simulation to enhance maintenance data to be used in maintenance prognostics modeling that can serve as basis for better maintenance decision support and planning.
29

Methods for data and user efficient annotation for multi-label topic classification / Effektiva annoteringsmetoder för klassificering med multipla klasser

Miszkurka, Agnieszka January 2022 (has links)
Machine Learning models trained using supervised learning can achieve great results when a sufficient amount of labeled data is used. However, the annotation process is a costly and time-consuming task. There are many methods devised to make the annotation pipeline more user and data efficient. This thesis explores techniques from Active Learning, Zero-shot Learning, Data Augmentation domains as well as pre-annotation with revision in the context of multi-label classification. Active ’Learnings goal is to choose the most informative samples for labeling. As an Active Learning state-of-the-art technique Contrastive Active Learning was adapted to a multi-label case. Once there is some labeled data, we can augment samples to make the dataset more diverse. English-German-English Backtranslation was used to perform Data Augmentation. Zero-shot learning is a setup in which a Machine Learning model can make predictions for classes it was not trained to predict. Zero-shot via Textual Entailment was leveraged in this study and its usefulness for pre-annotation with revision was reported. The results on the Reviews of Electric Vehicle Charging Stations dataset show that it may be beneficial to use Active Learning and Data Augmentation in the annotation pipeline. Active Learning methods such as Contrastive Active Learning can identify samples belonging to the rarest classes while Data Augmentation via Backtranslation can improve performance especially when little training data is available. The results for Zero-shot Learning via Textual Entailment experiments show that this technique is not suitable for the production environment. / Klassificeringsmodeller som tränas med övervakad inlärning kan uppnå goda resultat när en tillräcklig mängd annoterad data används. Annoteringsprocessen är dock en kostsam och tidskrävande uppgift. Det finns många metoder utarbetade för att göra annoteringspipelinen mer användar- och dataeffektiv. Detta examensarbete utforskar tekniker från områdena Active Learning, Zero-shot Learning, Data Augmentation, samt pre-annotering, där annoterarens roll är att verifiera eller revidera en klass föreslagen av systemet. Målet med Active Learning är att välja de mest informativa datapunkterna för annotering. Contrastive Active Learning utökades till fallet där en datapunkt kan tillhöra flera klasser. Om det redan finns några annoterade data kan vi utöka datamängden med artificiella datapunkter, med syfte att göra datasetet mer mångsidigt. Engelsk-Tysk-Engelsk översättning användes för att konstruera sådana artificiella datapunkter. Zero-shot-inlärning är en teknik i vilken en maskininlärningsmodell kan göra förutsägelser för klasser som den inte var tränad att förutsäga. Zero-shot via Textual Entailment utnyttjades i denna studie för att utöka datamängden med artificiella datapunkter. Resultat från datamängden “Reviews of Electric Vehicle Charging ”Stations visar att det kan vara fördelaktigt att använda Active Learning och Data Augmentation i annoteringspipelinen. Active Learning-metoder som Contrastive Active Learning kan identifiera datapunkter som tillhör de mest sällsynta klasserna, medan Data Augmentation via Backtranslation kan förbättra klassificerarens prestanda, särskilt när få träningsdata finns tillgänglig. Resultaten för Zero-shot Learning visar att denna teknik inte är lämplig för en produktionsmiljö.
30

Nonparametric Mixture Modeling on Constrained Spaces

Putu Ayu G Sudyanti (7038110) 16 August 2019 (has links)
<div>Mixture modeling is a classical unsupervised learning method with applications to clustering and density estimation. This dissertation studies two challenges in modeling data with mixture models. The first part addresses problems that arise when modeling observations lying on constrained spaces, such as the boundaries of a city or a landmass. It is often desirable to model such data through the use of mixture models, especially nonparametric mixture models. Specifying the component distributions and evaluating normalization constants raise modeling and computational challenges. In particular, the likelihood forms an intractable quantity, and Bayesian inference over the parameters of these models results in posterior distributions that are doubly-intractable. We address this problem via a model based on rejection sampling and an algorithm based on data augmentation. Our approach is to specify such models as restrictions of standard, unconstrained distributions to the constraint set, with measurements from the model simulated by a rejection sampling algorithm. Posterior inference proceeds by Markov chain Monte Carlo, first imputing the rejected samples given mixture parameters and then resampling parameters given all samples. We study two modeling approaches: mixtures of truncated Gaussians and truncated mixtures of Gaussians, along with Markov chain Monte Carlo sampling algorithms for both. We also discuss variations of the models, as well as approximations to improve mixing, reduce computational cost, and lower variance.</div><div><br></div><div>The second part of this dissertation explores the application of mixture models to estimate contamination rates in matched tumor and normal samples. Bulk sequencing of tumor samples are prone to contaminations from normal cells, which lead to difficulties and inaccuracies in determining the mutational landscape of the cancer genome. In such instances, a matched normal sample from the same patient can be used to act as a control for germline mutations. Probabilistic models are popularly used in this context due to their flexibility. We propose a hierarchical Bayesian model to denoise the contamination in such data and detect somatic mutations in tumor cell populations. We explore the use of a Dirichlet prior on the contamination level and extend this to a framework of Dirichlet processes. We discuss MCMC schemes to sample from the joint posterior distribution and evaluate its performance on both synthetic experiments and publicly available data.</div>

Page generated in 0.1028 seconds