31 |
Bayesian estimation of factor analysis models with incomplete dataMerkle, Edgar C. 10 October 2005 (has links)
No description available.
|
32 |
Bayesian Probit Regression Models for Spatially-Dependent Categorical DataBerrett, Candace 02 November 2010 (has links)
No description available.
|
33 |
Enhancing Text Readability Using Deep Learning TechniquesAlkaldi, Wejdan 20 July 2022 (has links)
In the information era, reading becomes more important to keep up with the growing
amount of knowledge. The ability to read a document varies from person to person depending on their skills and knowledge. It also depends on the readability level of the text, whether it matches the reader’s level or not. In this thesis, we propose a system that uses state-of-the-art technology in machine learning and deep learning to classify and simplify a text taking into consideration the reader’s level of reading. The system classifies any text to its equivalent readability level. If the text readability level is higher than the reader’s level, i.e. too difficult to read, the system performs text simplification to meet the desired readability level. The classification and simplification models are trained on data annotated with readability levels from in the Newsela corpus. The trained simplification model performs at sentence level, to simplify a given text to match a specific readability level. Moreover, the trained classification model is used to classify more unlabelled sentences using Wikipedia Corpus and Mechanical Turk Corpus in order to enrich the text simplification dataset. The augmented dataset is then used to improve the quality of the simplified sentences. The system generates simplified versions of a text based on the desired readability levels. This can help people with low literacy to read and understand any documents they need. It can also be beneficial to educators who assist readers with different reading levels.
|
34 |
Data Augmentation Approaches for Automatic Speech Recognition Using Text-to-Speech / 音声認識のための音声合成を用いたデータ拡張手法Ueno, Sei 23 March 2022 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24027号 / 情博第783号 / 新制||情||133(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 黒橋 禎夫, 教授 西野 恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
35 |
Maintenance Data Augmentation, using Markov Chain Monte Carlo Simulation : (Hamiltonian MCMC using NUTS)Roohani, Muhammad Ammar January 2024 (has links)
Reliable and efficient utilization and operation of any engineering asset require carefully designed maintenance planning and maintenance related data in the form of failure times, repair times, Mean Time between Failure (MTBF) and conditioning data etc. play a pivotal role in maintenance decision support. With the advancement in data analytics sciences and industrial artificial intelligence, maintenance related data is being used for maintenance prognostics modeling to predict future maintenance requirements that form the basis of maintenance design and planning in any maintenance-conscious industry like railways. The lack of such available data creates a no. of different types of problems in data driven prognostics modelling. There have been a few methods, the researchers have employed to counter the problems due to lack of available data. The proposed methodology involves data augmentation technique using Markov Chain Monte Carlo (MCMC) Simulation to enhance maintenance data to be used in maintenance prognostics modeling that can serve as basis for better maintenance decision support and planning.
|
36 |
Generative Data Augmentation: Using DCGAN To Expand Training Datasets For Chest X-Ray Pneumonia DetectionMaier, Ryan D 01 June 2024 (has links) (PDF)
Recent advancements in computer vision have demonstrated remarkable success in image classification tasks, particularly when provided with an ample supply of accurately labeled images for training. These techniques have also exhibited significant potential in revolutionizing computer-aided medical diagnosis by enabling the segmentation and classification of medical images, leveraging Convolutional Neural Networks (CNNs) and similar models. However, the integration of such technologies into clinical practice faces notable challenges. Chief among these is the obstacle of acquiring high-quality medical imaging data for training purposes. Patient privacy concerns often hinder researchers from accessing large datasets, while less common medical conditions pose additional hurdles due to scarcity of relevant data. This study aims to address the issue of insufficient data availability in medical imaging analysis. We present experiments employing Deep Convolutional Generative Adversarial Networks (DCGANs) to augment training datasets of chest X-ray images, specifically targeting the identification of pneumonia-affected lungs using CNNs. Our findings demonstrate that DCGAN-based generative data augmentation consistently enhances classification performance, even when training sets are severely limited in size.
|
37 |
Methods for data and user efficient annotation for multi-label topic classification / Effektiva annoteringsmetoder för klassificering med multipla klasserMiszkurka, Agnieszka January 2022 (has links)
Machine Learning models trained using supervised learning can achieve great results when a sufficient amount of labeled data is used. However, the annotation process is a costly and time-consuming task. There are many methods devised to make the annotation pipeline more user and data efficient. This thesis explores techniques from Active Learning, Zero-shot Learning, Data Augmentation domains as well as pre-annotation with revision in the context of multi-label classification. Active ’Learnings goal is to choose the most informative samples for labeling. As an Active Learning state-of-the-art technique Contrastive Active Learning was adapted to a multi-label case. Once there is some labeled data, we can augment samples to make the dataset more diverse. English-German-English Backtranslation was used to perform Data Augmentation. Zero-shot learning is a setup in which a Machine Learning model can make predictions for classes it was not trained to predict. Zero-shot via Textual Entailment was leveraged in this study and its usefulness for pre-annotation with revision was reported. The results on the Reviews of Electric Vehicle Charging Stations dataset show that it may be beneficial to use Active Learning and Data Augmentation in the annotation pipeline. Active Learning methods such as Contrastive Active Learning can identify samples belonging to the rarest classes while Data Augmentation via Backtranslation can improve performance especially when little training data is available. The results for Zero-shot Learning via Textual Entailment experiments show that this technique is not suitable for the production environment. / Klassificeringsmodeller som tränas med övervakad inlärning kan uppnå goda resultat när en tillräcklig mängd annoterad data används. Annoteringsprocessen är dock en kostsam och tidskrävande uppgift. Det finns många metoder utarbetade för att göra annoteringspipelinen mer användar- och dataeffektiv. Detta examensarbete utforskar tekniker från områdena Active Learning, Zero-shot Learning, Data Augmentation, samt pre-annotering, där annoterarens roll är att verifiera eller revidera en klass föreslagen av systemet. Målet med Active Learning är att välja de mest informativa datapunkterna för annotering. Contrastive Active Learning utökades till fallet där en datapunkt kan tillhöra flera klasser. Om det redan finns några annoterade data kan vi utöka datamängden med artificiella datapunkter, med syfte att göra datasetet mer mångsidigt. Engelsk-Tysk-Engelsk översättning användes för att konstruera sådana artificiella datapunkter. Zero-shot-inlärning är en teknik i vilken en maskininlärningsmodell kan göra förutsägelser för klasser som den inte var tränad att förutsäga. Zero-shot via Textual Entailment utnyttjades i denna studie för att utöka datamängden med artificiella datapunkter. Resultat från datamängden “Reviews of Electric Vehicle Charging ”Stations visar att det kan vara fördelaktigt att använda Active Learning och Data Augmentation i annoteringspipelinen. Active Learning-metoder som Contrastive Active Learning kan identifiera datapunkter som tillhör de mest sällsynta klasserna, medan Data Augmentation via Backtranslation kan förbättra klassificerarens prestanda, särskilt när få träningsdata finns tillgänglig. Resultaten för Zero-shot Learning visar att denna teknik inte är lämplig för en produktionsmiljö.
|
38 |
Nonparametric Mixture Modeling on Constrained SpacesPutu Ayu G Sudyanti (7038110) 16 August 2019 (has links)
<div>Mixture modeling is a classical unsupervised learning method with applications to clustering and density estimation. This dissertation studies two challenges in modeling data with mixture models. The first part addresses problems that arise when modeling observations lying on constrained spaces, such as the boundaries of a city or a landmass. It is often desirable to model such data through the use of mixture models, especially nonparametric mixture models. Specifying the component distributions and evaluating normalization constants raise modeling and computational challenges. In particular, the likelihood forms an intractable quantity, and Bayesian inference over the parameters of these models results in posterior distributions that are doubly-intractable. We address this problem via a model based on rejection sampling and an algorithm based on data augmentation. Our approach is to specify such models as restrictions of standard, unconstrained distributions to the constraint set, with measurements from the model simulated by a rejection sampling algorithm. Posterior inference proceeds by Markov chain Monte Carlo, first imputing the rejected samples given mixture parameters and then resampling parameters given all samples. We study two modeling approaches: mixtures of truncated Gaussians and truncated mixtures of Gaussians, along with Markov chain Monte Carlo sampling algorithms for both. We also discuss variations of the models, as well as approximations to improve mixing, reduce computational cost, and lower variance.</div><div><br></div><div>The second part of this dissertation explores the application of mixture models to estimate contamination rates in matched tumor and normal samples. Bulk sequencing of tumor samples are prone to contaminations from normal cells, which lead to difficulties and inaccuracies in determining the mutational landscape of the cancer genome. In such instances, a matched normal sample from the same patient can be used to act as a control for germline mutations. Probabilistic models are popularly used in this context due to their flexibility. We propose a hierarchical Bayesian model to denoise the contamination in such data and detect somatic mutations in tumor cell populations. We explore the use of a Dirichlet prior on the contamination level and extend this to a framework of Dirichlet processes. We discuss MCMC schemes to sample from the joint posterior distribution and evaluate its performance on both synthetic experiments and publicly available data.</div>
|
39 |
Machine learning and augmented data for automated treatment planning in complex external beam radiation therapyLempart, Michael January 2019 (has links)
External beam radiation therapy is currently one of the most commonly used modalities for treating cancer. With the rise of new technologies and increasing computational power, machine learning, deep learning and artificial intelligence applications used for classification and regression problems have begun to find their way into the field of radiation oncology. One such application is the automated generation of radiotherapy treatment plans, which must be optimized for every single patient. The department of radiation physics in Lund, Sweden, has developed an autoplanning software, which in combination with a commercially available treatment planning system (TPS), can be used for automatic creation of clinical treatment plans. The parameters of a multivariable cost function are changed iteratively, making it possible to generate a great amount of different treatment plans for a single patient. The output leads to optimal, near-optimal, clinically acceptable or even non-acceptable treatment plans. In this thesis, the possibility of using machine and deep learning to minimize the amount of treatment plans generated by the autoplanning software as well as the possibility of finding cost function parameters that lead to clinically acceptable optimal or near-optimal plans is evaluated. Data augmentation is used to create matrices of optimal treatment plan parameters, which are stored in a training database. Patient specific training features are extracted from the TPS, as well as from the bottleneck layer of a trained deep neural network autoencoder. The training features are then matched against the same features extracted for test patients, using a k-nearest neighbor algorithm. Finally, treatment plans for a new patient are generated using the output plan parameter matrices of its nearest neighbors. This allows for a reduction in computation time as well as for finding suitable cost function parameters for a new patient.
|
40 |
End-to-End Full-Page Handwriting RecognitionWigington, Curtis Michael 01 May 2018 (has links)
Despite decades of research, offline handwriting recognition (HWR) of historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. Historical documents are plagued with noise, degradation, ink bleed-through, overlapping strokes, variation in slope and slant of the writing, and inconsistent layouts. Often the documents in a collection have been written by thousands of authors, all of whom have significantly different writing styles. In order to better capture the variations in writing styles we introduce a novel data augmentation technique. This methods achieves state-of-the-art results on modern datasets written in English and French and a historical dataset written in German.HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation.Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations.Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of handwriting lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) handwriting into dewarped images, and a CNN-LSTM network to read the characters. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations.
|
Page generated in 0.0288 seconds