Global ETD Search

31	Enhancing Text Readability Using Deep Learning Techniques Alkaldi, Wejdan 20 July 2022 (has links) In the information era, reading becomes more important to keep up with the growing amount of knowledge. The ability to read a document varies from person to person depending on their skills and knowledge. It also depends on the readability level of the text, whether it matches the reader’s level or not. In this thesis, we propose a system that uses state-of-the-art technology in machine learning and deep learning to classify and simplify a text taking into consideration the reader’s level of reading. The system classifies any text to its equivalent readability level. If the text readability level is higher than the reader’s level, i.e. too difficult to read, the system performs text simplification to meet the desired readability level. The classification and simplification models are trained on data annotated with readability levels from in the Newsela corpus. The trained simplification model performs at sentence level, to simplify a given text to match a specific readability level. Moreover, the trained classification model is used to classify more unlabelled sentences using Wikipedia Corpus and Mechanical Turk Corpus in order to enrich the text simplification dataset. The augmented dataset is then used to improve the quality of the simplified sentences. The system generates simplified versions of a text based on the desired readability levels. This can help people with low literacy to read and understand any documents they need. It can also be beneficial to educators who assist readers with different reading levels. NLP Text Simplification Text Classification Deep Learning Reinforcement Learning Data Augmentation Natural Language Processing
32	Data Augmentation Approaches for Automatic Speech Recognition Using Text-to-Speech / 音声認識のための音声合成を用いたデータ拡張手法 Ueno, Sei 23 March 2022 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24027号 / 情博第783号 / 新制\|\|情\|\|133(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授西野恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Speech Recognition Data Augmentation Domain Adaptation Text-to-Speech Speech-to-Text 007
33	Maintenance Data Augmentation, using Markov Chain Monte Carlo Simulation : (Hamiltonian MCMC using NUTS) Roohani, Muhammad Ammar January 2024 (has links) Reliable and efficient utilization and operation of any engineering asset require carefully designed maintenance planning and maintenance related data in the form of failure times, repair times, Mean Time between Failure (MTBF) and conditioning data etc. play a pivotal role in maintenance decision support. With the advancement in data analytics sciences and industrial artificial intelligence, maintenance related data is being used for maintenance prognostics modeling to predict future maintenance requirements that form the basis of maintenance design and planning in any maintenance-conscious industry like railways. The lack of such available data creates a no. of different types of problems in data driven prognostics modelling. There have been a few methods, the researchers have employed to counter the problems due to lack of available data. The proposed methodology involves data augmentation technique using Markov Chain Monte Carlo (MCMC) Simulation to enhance maintenance data to be used in maintenance prognostics modeling that can serve as basis for better maintenance decision support and planning. Maintenance Engineering Data Augmentation Prognostics Modeling Maintenance Analytics Markov Chain Monte Carlo Simulation Civil Engineering Samhällsbyggnadsteknik
34	Generative Data Augmentation: Using DCGAN To Expand Training Datasets For Chest X-Ray Pneumonia Detection Maier, Ryan D 01 June 2024 (has links) (PDF) Recent advancements in computer vision have demonstrated remarkable success in image classification tasks, particularly when provided with an ample supply of accurately labeled images for training. These techniques have also exhibited significant potential in revolutionizing computer-aided medical diagnosis by enabling the segmentation and classification of medical images, leveraging Convolutional Neural Networks (CNNs) and similar models. However, the integration of such technologies into clinical practice faces notable challenges. Chief among these is the obstacle of acquiring high-quality medical imaging data for training purposes. Patient privacy concerns often hinder researchers from accessing large datasets, while less common medical conditions pose additional hurdles due to scarcity of relevant data. This study aims to address the issue of insufficient data availability in medical imaging analysis. We present experiments employing Deep Convolutional Generative Adversarial Networks (DCGANs) to augment training datasets of chest X-ray images, specifically targeting the identification of pneumonia-affected lungs using CNNs. Our findings demonstrate that DCGAN-based generative data augmentation consistently enhances classification performance, even when training sets are severely limited in size. Machine Learning Deep Learning Image Generation Data Augmentation Medical Imaging Biomedical Other Engineering
35	Methods for data and user efficient annotation for multi-label topic classification / Effektiva annoteringsmetoder för klassificering med multipla klasser Miszkurka, Agnieszka January 2022 (has links) Machine Learning models trained using supervised learning can achieve great results when a sufficient amount of labeled data is used. However, the annotation process is a costly and time-consuming task. There are many methods devised to make the annotation pipeline more user and data efficient. This thesis explores techniques from Active Learning, Zero-shot Learning, Data Augmentation domains as well as pre-annotation with revision in the context of multi-label classification. Active ’Learnings goal is to choose the most informative samples for labeling. As an Active Learning state-of-the-art technique Contrastive Active Learning was adapted to a multi-label case. Once there is some labeled data, we can augment samples to make the dataset more diverse. English-German-English Backtranslation was used to perform Data Augmentation. Zero-shot learning is a setup in which a Machine Learning model can make predictions for classes it was not trained to predict. Zero-shot via Textual Entailment was leveraged in this study and its usefulness for pre-annotation with revision was reported. The results on the Reviews of Electric Vehicle Charging Stations dataset show that it may be beneficial to use Active Learning and Data Augmentation in the annotation pipeline. Active Learning methods such as Contrastive Active Learning can identify samples belonging to the rarest classes while Data Augmentation via Backtranslation can improve performance especially when little training data is available. The results for Zero-shot Learning via Textual Entailment experiments show that this technique is not suitable for the production environment. / Klassificeringsmodeller som tränas med övervakad inlärning kan uppnå goda resultat när en tillräcklig mängd annoterad data används. Annoteringsprocessen är dock en kostsam och tidskrävande uppgift. Det finns många metoder utarbetade för att göra annoteringspipelinen mer användar- och dataeffektiv. Detta examensarbete utforskar tekniker från områdena Active Learning, Zero-shot Learning, Data Augmentation, samt pre-annotering, där annoterarens roll är att verifiera eller revidera en klass föreslagen av systemet. Målet med Active Learning är att välja de mest informativa datapunkterna för annotering. Contrastive Active Learning utökades till fallet där en datapunkt kan tillhöra flera klasser. Om det redan finns några annoterade data kan vi utöka datamängden med artificiella datapunkter, med syfte att göra datasetet mer mångsidigt. Engelsk-Tysk-Engelsk översättning användes för att konstruera sådana artificiella datapunkter. Zero-shot-inlärning är en teknik i vilken en maskininlärningsmodell kan göra förutsägelser för klasser som den inte var tränad att förutsäga. Zero-shot via Textual Entailment utnyttjades i denna studie för att utöka datamängden med artificiella datapunkter. Resultat från datamängden “Reviews of Electric Vehicle Charging ”Stations visar att det kan vara fördelaktigt att använda Active Learning och Data Augmentation i annoteringspipelinen. Active Learning-metoder som Contrastive Active Learning kan identifiera datapunkter som tillhör de mest sällsynta klasserna, medan Data Augmentation via Backtranslation kan förbättra klassificerarens prestanda, särskilt när få träningsdata finns tillgänglig. Resultaten för Zero-shot Learning visar att denna teknik inte är lämplig för en produktionsmiljö. Natural Language Processing Multi-label text classification Active Learning Zero-shot learning Data Augmentation Data-centric AI Naturlig språkbehandling Textklassificering med multipla klasser Active Learning Zero-shot learning Data Augmentation Datacentrerad AI Computer and Information Sciences Data- och informationsvetenskap
36	Nonparametric Mixture Modeling on Constrained Spaces Putu Ayu G Sudyanti (7038110) 16 August 2019 (has links) <div>Mixture modeling is a classical unsupervised learning method with applications to clustering and density estimation. This dissertation studies two challenges in modeling data with mixture models. The first part addresses problems that arise when modeling observations lying on constrained spaces, such as the boundaries of a city or a landmass. It is often desirable to model such data through the use of mixture models, especially nonparametric mixture models. Specifying the component distributions and evaluating normalization constants raise modeling and computational challenges. In particular, the likelihood forms an intractable quantity, and Bayesian inference over the parameters of these models results in posterior distributions that are doubly-intractable. We address this problem via a model based on rejection sampling and an algorithm based on data augmentation. Our approach is to specify such models as restrictions of standard, unconstrained distributions to the constraint set, with measurements from the model simulated by a rejection sampling algorithm. Posterior inference proceeds by Markov chain Monte Carlo, first imputing the rejected samples given mixture parameters and then resampling parameters given all samples. We study two modeling approaches: mixtures of truncated Gaussians and truncated mixtures of Gaussians, along with Markov chain Monte Carlo sampling algorithms for both. We also discuss variations of the models, as well as approximations to improve mixing, reduce computational cost, and lower variance.</div><div><br></div><div>The second part of this dissertation explores the application of mixture models to estimate contamination rates in matched tumor and normal samples. Bulk sequencing of tumor samples are prone to contaminations from normal cells, which lead to difficulties and inaccuracies in determining the mutational landscape of the cancer genome. In such instances, a matched normal sample from the same patient can be used to act as a control for germline mutations. Probabilistic models are popularly used in this context due to their flexibility. We propose a hierarchical Bayesian model to denoise the contamination in such data and detect somatic mutations in tumor cell populations. We explore the use of a Dirichlet prior on the contamination level and extend this to a framework of Dirichlet processes. We discuss MCMC schemes to sample from the joint posterior distribution and evaluate its performance on both synthetic experiments and publicly available data.</div> Statistics Mixture models Bayesian nonparametric Markov chain Monte Carlo MCMC Doubly-intractable Data augmentation Tumor contamination Clustering
37	Machine learning and augmented data for automated treatment planning in complex external beam radiation therapy Lempart, Michael January 2019 (has links) External beam radiation therapy is currently one of the most commonly used modalities for treating cancer. With the rise of new technologies and increasing computational power, machine learning, deep learning and artificial intelligence applications used for classification and regression problems have begun to find their way into the field of radiation oncology. One such application is the automated generation of radiotherapy treatment plans, which must be optimized for every single patient. The department of radiation physics in Lund, Sweden, has developed an autoplanning software, which in combination with a commercially available treatment planning system (TPS), can be used for automatic creation of clinical treatment plans. The parameters of a multivariable cost function are changed iteratively, making it possible to generate a great amount of different treatment plans for a single patient. The output leads to optimal, near-optimal, clinically acceptable or even non-acceptable treatment plans. In this thesis, the possibility of using machine and deep learning to minimize the amount of treatment plans generated by the autoplanning software as well as the possibility of finding cost function parameters that lead to clinically acceptable optimal or near-optimal plans is evaluated. Data augmentation is used to create matrices of optimal treatment plan parameters, which are stored in a training database. Patient specific training features are extracted from the TPS, as well as from the bottleneck layer of a trained deep neural network autoencoder. The training features are then matched against the same features extracted for test patients, using a k-nearest neighbor algorithm. Finally, treatment plans for a new patient are generated using the output plan parameter matrices of its nearest neighbors. This allows for a reduction in computation time as well as for finding suitable cost function parameters for a new patient. Machine Learning Deep Learning Data Augmentation Treatment Planning External Beam Radiation Therapy Elektroteknik och elektronik
38	End-to-End Full-Page Handwriting Recognition Wigington, Curtis Michael 01 May 2018 (has links) Despite decades of research, offline handwriting recognition (HWR) of historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. Historical documents are plagued with noise, degradation, ink bleed-through, overlapping strokes, variation in slope and slant of the writing, and inconsistent layouts. Often the documents in a collection have been written by thousands of authors, all of whom have significantly different writing styles. In order to better capture the variations in writing styles we introduce a novel data augmentation technique. This methods achieves state-of-the-art results on modern datasets written in English and French and a historical dataset written in German.HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation.Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations.Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of handwriting lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) handwriting into dewarped images, and a CNN-LSTM network to read the characters. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations. Handwriting Recognition Document Analysis Historical Document Processing Text Detection Text Line Segmentation Data Augmentation. Physical Sciences and Mathematics
39	Efficient Bayesian Inference for Multivariate Factor Stochastic Volatility Models Kastner, Gregor, Frühwirth-Schnatter, Sylvia, Lopes, Hedibert Freitas 24 February 2016 (has links) (PDF) We discuss efficient Bayesian estimation of dynamic covariance matrices in multivariate time series through a factor stochastic volatility model. In particular, we propose two interweaving strategies (Yu and Meng, Journal of Computational and Graphical Statistics, 20(3), 531-570, 2011) to substantially accelerate convergence and mixing of standard MCMC approaches. Similar to marginal data augmentation techniques, the proposed acceleration procedures exploit non-identifiability issues which frequently arise in factor models. Our new interweaving strategies are easy to implement and come at almost no extra computational cost; nevertheless, they can boost estimation efficiency by several orders of magnitude as is shown in extensive simulation studies. To conclude, the application of our algorithm to a 26-dimensional exchange rate data set illustrates the superior performance of the new approach for real-world data. / Series: Research Report Series / Department of Statistics and Mathematics
40	O valor futuro de cada cliente : estimação do Customer Lifetime Value Silveira, Rodrigo Heldt January 2014 (has links) A capacidade de o marketing mensurar e comunicar o valor de suas atividades e investimentos tem sido uma das prioridades de pesquisa na área nos últimos anos. Para atingir esse objetivo, a capacidade de mensurar adequadamente os ativos de marketing, como o Customer Lifetime Value e, de forma agregada, o Customer Equity, torna-se essencial, pois esses ativos são considerados os elementos capazes de traduzir em valores monetários o resultado dos diversos investimentos realizados pela área de marketing. Diante da mensuração desses valores, é possível o planejamento e a realização de ações mais precisas por parte dos profissionais de marketing. Sendo assim, no presente estudo objetivou-se construir e aplicar um modelo de estimação de Customer Lifetime Value no modo bottom-up (individual por cliente) em uma amostra de clientes de uma empresa do setor de serviços financeiros. O modelo bayesiano hierárquico aplicado, com três regressões estruturadas conforme o modelo Seemingly Unrelated Regressions (SUR) (ZELNER, 1971), foi construído a partir dos trabalhos de Kumar et al. (2008), Kumar e Shah (2009) e Cowles, Carlin e Connet (1996). Os resultados evidenciaram (1) que o modelo foi capaz de estimar com consistência o valor futuro de 84% dos clientes analisados; (2) que esse valor estimado traduz o potencial de rentabilidade que pode ser esperado futuramente para cada cliente; (3) que a base de clientes pode ser segmentada a partir do Customer Lifetime Value. Diante do conhecimento do valor futuro de cada cliente, se vislumbrou possibilidades de ações que tragam melhorias para gestão de clientes tradicionalmente utilizada, principalmente no que diz respeito à alocação dos recursos de marketing. / The marketing capacity to measure and to communicate the value resultant of its activities and investments has been one of the area top research priorities in the last few years. In order to achieve this objective, the capacity to appropriately measure the marketing assets, as the Customer Lifetime Value and, in aggregate form, the Customer Equity, has been pointed out as essential, because this assets are considered elements capable of translating the result of marketing investments into monetary values. Given the measurement of those values, marketers become able to plan and take more precise actions. Thus, the objective of present study is to build and test a bottom-up Customer Lifetime Value estimation model to a sample of customers from a company of finance services. The bayesian hierarchical model, composed of three regressions structured according to the Seemingly Unrelated Regressions (SUR) model (ZELNER, 1971), was built from the works of Kumar et al. (2008), Kumar and Shah (2009) and Cowles, Carlin and Connet (1996). The results show that (1) the model was capable to estimate with consistency the future value of 84% of the analyzed customers; (2) this estimated future values indicate the potential profitability of each customer; (3) the customer base can be segmented from the Customer Lifetime Value. Given the knowledge obtained about the future value of each customer and the segments established, several actions that can bring improvements to the traditional way of managing customers were suggested, in special those concerning marketing resource allocation. Valor do cliente (Customer equity) Marketing de relacionamento Segmentação de mercado Customer lifetime value Customer equity Customer segmentation Bayesian estimation Data augmentation

Search results