Global ETD Search

71	Training Neural Models for Abstractive Text Summarization Kryściński, Wojciech January 2018 (has links) Abstractive text summarization aims to condense long textual documents into a short, human-readable form while preserving the most important information from the source document. A common approach to training summarization models is by using maximum likelihood estimation with the teacher forcing strategy. Despite its popularity, this method has been shown to yield models with suboptimal performance at inference time. This work examines how using alternative, task-specific training signals affects the performance of summarization models. Two novel training signals are proposed and evaluated as part of this work. One, a novelty metric, measuring the overlap between n-grams in the summary and the summarized article. The other, utilizing a discriminator model to distinguish human-written summaries from generated ones on a word-level basis. Empirical results show that using the mentioned metrics as rewards for policy gradient training yields significant performance gains measured by ROUGE scores, novelty scores and human evaluation. / Abstraktiv textsammanfattning syftar på att korta ner långa textdokument till en förkortad, mänskligt läsbar form, samtidigt som den viktigaste informationen i källdokumentet bevaras. Ett vanligt tillvägagångssätt för att träna sammanfattningsmodeller är att använda maximum likelihood-estimering med teacher-forcing-strategin. Trots dess popularitet har denna metod visat sig ge modeller med suboptimal prestanda vid inferens. I det här arbetet undersöks hur användningen av alternativa, uppgiftsspecifika träningssignaler påverkar sammanfattningsmodellens prestanda. Två nya träningssignaler föreslås och utvärderas som en del av detta arbete. Den första, vilket är en ny metrik, mäter överlappningen mellan n-gram i sammanfattningen och den sammanfattade artikeln. Den andra använder en diskrimineringsmodell för att skilja mänskliga skriftliga sammanfattningar från genererade på ordnivå. Empiriska resultat visar att användandet av de nämnda mätvärdena som belöningar för policygradient-träning ger betydande prestationsvinster mätt med ROUGE-score, novelty score och mänsklig utvärdering. machine learning deep learning text summarization natural language processing neural networks recurrent neural networks reinforcement learning generative adversarial networks gans abstractive text summarization nlp Computer Sciences Datavetenskap (datalogi)
72	Restaurant Daily Revenue Prediction : Utilizing Synthetic Time Series Data for Improved Model Performance Jarlöv, Stella, Svensson Dahl, Anton January 2023 (has links) This study aims to enhance the accuracy of a demand forecasting model, XGBoost, by incorporating synthetic multivariate restaurant time series data during the training process. The research addresses the limited availability of training data by generating synthetic data using TimeGAN, a generative adversarial deep neural network tailored for time series data. A one-year daily time series dataset, comprising numerical and categorical features based on a real restaurant's sales history, supplemented by relevant external data, serves as the original data. TimeGAN learns from this dataset to create synthetic data that closely resembles the original data in terms of temporal and distributional dynamics. Statistical and visual analyses demonstrate a strong similarity between the synthetic and original data. To evaluate the usefulness of the synthetic data, an experiment is conducted where varying lengths of synthetic data are iteratively combined with the one-year real dataset. Each iteration involves retraining the XGBoost model and assessing its accuracy for a one-week forecast using the Root Mean Square Error (RMSE). The results indicate that incorporating 6 years of synthetic data improves the model's performance by 65%. The hyperparameter configurations suggest that deeper tree structures benefit the XGBoost model when synthetic data is added. Furthermore, the model exhibits improved feature selection with an increased amount of training data. This study demonstrates that incorporating synthetic data closely resembling the original data can effectively enhance the accuracy of predictive models, particularly when training data is limited. demand forecasting data augmentation time series data machine learning restaurant industry generative adversarial networks TimeGAN XGBoost Computer and Information Sciences Data- och informationsvetenskap
73	SELF-SUPERVISED ONE-SHOT LEARNING FOR AUTOMATIC SEGMENTATION OF GAN-GENERATED IMAGES Ankit V Manerikar (16523988) 11 July 2023 (has links) <p>Generative Adversarial Networks (GANs) have consistently defined the state-of-the-art in the generative modelling of high-quality images in several applications. The images generated using GANs, however, do not lend themselves to being directly used in supervised learning tasks without first being curated through annotations. This dissertation investigates how to carry out automatic on-the-fly segmentation of GAN-generated images and how this can be applied to the problem of producing high-quality simulated data for X-ray based security screening. The research exploits the hidden layer properties of GAN models in a self-supervised learning framework for the automatic one-shot segmentation of images created by a style-based GAN. The framework consists of a novel contrastive learner that is based on a Sinkhorn distance-based clustering algorithm and that learns a compact feature space for per-pixel classification of the GAN-generated images. This facilitates faster learning of the feature vectors for one-shot segmentation and allows on-the-fly automatic annotation of the GAN images. We have tested our framework on a number of standard benchmarks (CelebA, PASCAL, LSUN) to yield a segmentation performance that not only exceeds the semi-supervised baselines by an average wIoU margin of 1.02 % but also improves the inference speeds by a factor of 4.5. This dissertation also presents BagGAN, an extension of our framework to the problem domain of X-ray based baggage screening. BagGAN produces annotated synthetic baggage X-ray scans to train machine-learning algorithms for the detection of prohibited items during security screening. We have compared the images generated by BagGAN with those created by deterministic ray-tracing models for X-ray simulation and have observed that our GAN-based baggage simulator yields a significantly improved performance in terms of image fidelity and diversity. The BagGAN framework is also tested on the PIDRay and other baggage screening benchmarks to produce segmentation results comparable to their respective baseline segmenters based on manual annotations.</p> Computer vision Adversarial machine learning Deep learning Generative Adversarial Networks (GANs) Self-Supervised Learning Image Segmentation One-Shot Learning X-ray imaging and computed tomography
74	[pt] APRIMORANDO A SÍNTESE DE IMAGENS A PARTIR DE TEXTO UTILIZANDO TRANSFERÊNCIA DE APRENDIZADO U2C / [en] IMPROVING TEXT-TO-IMAGE SYNTHESIS WITH U2C - TRANSFER LEARNING VINICIUS GOMES PEREIRA 06 February 2024 (has links) [pt] As Redes Generativas Adversariais (GANs) são modelos não supervisionados capazes de aprender a partir de um número indefinidamente grande de imagens. Entretanto, modelos que geram imagens a partir de linguagem dependem de dados rotulados de alta qualidade, que são escassos. A transferência de aprendizado é uma técnica conhecida que alivia a necessidade de dados rotulados, embora transformar um modelo gerativo incondicional em um modelo condicionado a texto não seja uma tarefa trivial. Este trabalho propõe uma abordagem de ajuste simples, porém eficaz, chamada U2C transfer. Esta abordagem é capaz de aproveitar modelos pré-treinados não condicionados enquanto aprende a respeitar as condições textuais fornecidas. Avaliamos a eficiência do U2C transfer ao ajustar o StyleGAN2 em duas das fontes de dados mais utilizadas para a geração images a partir de texto, resultando na arquitetura Text-Conditioned StyleGAN2 (TC-StyleGAN2). Nossos modelos alcançaram rapidamente o estado da arte nas bases de dados CUB-200 e Oxford-102, com valores de FID de 7.49 e 9.47, respectivamente. Esses valores representam ganhos relativos de 7 por cento e 68 por cento, respectivamente, em comparação com trabalhos anteriores. Demonstramos que nosso método é capaz de aprender detalhes refinados a partir de consultas de texto, produzindo imagens fotorrealistas e detalhadas. Além disso, mostramos que os modelos organizam o espaço intermediário de maneira semanticamente significativa. Nossas descobertas revelam que as imagens sintetizadas usando nossa técnica proposta não são apenas críveis, mas também exibem forte alinhamento com suas descrições textuais correspondentes. De fato, os escores de alinhamento textual alcançados por nosso método são impressionantemente e comparáveis aos das imagens reais. / [en] Generative Adversarial Networks (GANs) are unsupervised models that can learn from an indefinitely large amount of images. On the other hand, models that generate images from language queries depend on high-quality labeled data that is scarce. Transfer learning is a known technique that alleviates the need for labeled data, though it is not trivial to turn an unconditional generative model into a text-conditioned one. This work proposes a simple, yet effective fine-tuning approach, called Unconditional-to-Conditional Transfer Learning (U2C transfer). It can leverage well-established pre-trained models while learning to respect the given textual condition conditions. We evaluate U2C transfer efficiency by fine-tuning StyleGAN2 in two of the most widely used text-to-image data sources, generating the Text-Conditioned StyleGAN2 (TC-StyleGAN2). Our models quickly achieved state-of-the-art results in the CUB-200 and Oxford-102 datasets, with FID values of 7.49 and 9.47, respectively. These values represent relative gains of 7 percent and 68 percent compared to prior work. We show that our method is capable of learning fine-grained details from text queries while producing photorealistic and detailed images. Our findings highlight that the images created using our proposed technique are credible and display a robust alignment with their corresponding textual descriptions. [pt] REDES GENERATIVAS ADVERSARIAIS [pt] APRENDIZADO MULTIMODAL [pt] TRANSFERENCIA DE APRENDIZADO [pt] SINTESE DE IMAGENS [en] GENERATIVE ADVERSARIAL NETWORKS [en] MULTIMODAL LEARNING [en] TRANSFER LEARNING [en] IMAGE SYNTHESIS
75	Random projections in a distributed environment for privacy-preserved deep learning / Slumpmässiga projektioner i en distribuerad miljö för privatiserad djupinlärning Bagger Toräng, Malcolm January 2021 (has links) The field of Deep Learning (DL) only over the last decade has proven useful for increasingly more complex Machine Learning tasks and data, a notable milestone being generative models achieving facial synthesis indistinguishable from real faces. With the increased complexity in DL architecture and training data, follows a steep increase in time and hardware resources required for the training task. These resources are easily accessible via cloud-based platforms if the data owner is willing to share its training data. To allow for cloud-sharing of its training data, The Swedish Transport Administration (TRV) is interested in evaluating resource effective, infrastructure independent, privacy-preserving obfuscation methods to be used on real-time collected data on distributed Internet-of-Things (IoT) devices. A fundamental problem in this setting is to balance the trade-off between privacy and DL utility of the obfuscated training data. We identify statistically measurable relevant metrics of privacy achievable via obfuscation and compare two prominent alternatives from the literature, optimization-based methods (OBM) and random projections (RP). OBM achieve privacy via direct optimization towards a metric, preserving utility-crucial patterns in the data, and is typically in addition evaluated in terms of a DL-based adversary’s sensitive feature estimation error. RP project data via a random matrix to lower dimensions to preserve sample pair-wise distances while offering privacy in terms of difficulty in data recovery. The goals of the project centered around evaluating RP on privacy metric results previously attained for OBM, compare adversarial feature estimation error in OBM and RP, as well as to address the possibly infeasible learning task of using composite multi-device datasets generated using independent projection matrices. The last goal is relevant to TRV in that multiple devices are likely to contribute to the same composite dataset. Our results complement previous research in that they indicate that both privacy and utility guarantees in a distributed setting, vary depending on data type and learning task. These results favor OBM that theoretically should offer more robust guarantees. Our results and conclusions would encourage further experimentation with RP in a distributed setting to better understand the influence of data type and learning task on privacy-utility, target-distributed data sources being a promising starting point. / Forskningsområdet Deep Learning (DL) bara under det senaste decenniet har visat sig vara användbart för allt mer komplexa maskinginlärnings-uppgifter och data, en anmärkningsvärd milstolpe är generativa modeller som erhåller verklighetstrogna syntetiska ansiktsbilder. Med den ökade komplexiteten i DL -arkitektur och träningsdata följer ett kraftigt ökat behov av tid och hårdvaruresurser för träningsuppgiften. Dessa resurser är lättillgängliga via molnbaserade plattformar om dataägaren är villig att dela sin träningsdata. För att möjliggöra molndelning av träningsdata är Trafikverket (TRV) intresserat av att utvärdera resurseffektiva, infrastrukturoberoende, privatiserade obfuskeringsmetoder som ska användas på data hämtad i realtid via distribuerade Internet-of-Things ( IoT) -enheter; det grundläggande problemet är avvägningen mellan privatisering och användbarhet av datan i DL-syfte. Vi identifierar statistiskt mätbara relevanta mått av privatisering som kan uppnås via obfuskering och jämför två framstående alternativ från litteraturen, optimeringsbaserade metoder (OBM) och slumpmässiga projektioner (RP). OBM uppnår privatisering via matematisk optimering av ett mått av data-privatisering, vilket bevarar övriga nödvändiga mönster i data för DL-uppgiften. OBM-metoder utvärderas vanligtvis i termer av en DL-baserad motståndares uppskattningsfel av känsliga attribut i datan. RP obfuskerar data via en slumpmässig projektion till lägre dimensioner för att bevara avstånd mellan datapunkter samtidigt som de erbjuder privatisering genom teoretisk svårighet i dataåterställning. Målen för examensarbetet centrerades kring utvärdering av RP på privatiserings-mått som tidigare uppnåtts för OBM, att jämföra DL-baserade motståndares uppskattningsfel på data från OBM och RP, samt att ta itu med den befarat omöjliga inlärningsuppgiften att använda sammansatta dataset från flera IoT-enheter som använder oberoende projektionsmatriser. Sistnämnda målet är relevant i en miljö sådan som TRVs, där flera IoT-enheter oberoende bidrar till ett och samma dataset och DL-uppgift. Våra resultat kompletterar tidigare forskning genom att de indikerar att både privatisering och användbarhetsgarantier i en distribuerad miljö varierar beroende på datatyp och inlärningsuppgift. Dessa resultat gynnar OBM som teoretiskt sett bör erbjuda mer robusta garantier vad gäller användbarhet. Våra resultat och slutsatser uppmuntrar framtida experiment med RP i en distribuerad miljö för att bättre förstå inverkan av datatyp och inlärningsuppgift på graden av privatisering, datakällor distribuerade baserat på klassificerings-target är en lovande utgångspunkt. Random projections Generative adversarial networks Privacy metrics Deep learning Obfuscation. Slumpmässiga projektioner Generativa kontroversiella nätverk Privatiserings-mått Djupinlärning Obfuskering. Computer Sciences Datavetenskap (datalogi)
76	Generating Geospatial Trip DataUsing Deep Neural Networks Alhasan, Ahmed January 2022 (has links) Synthetic data provides a good alternative to real data when the latter is not sufficientor limited by privacy requirements. In spatio-temporal applications, generating syntheticdata is generally more complex due to the existence of both spatial and temporal dependencies.Recently, with the advent of deep generative modeling such as GenerativeAdversarial Networks (GAN), synthetic data generation has seen a lot of development andsuccess. This thesis uses a GAN model based on two Recurrent Neural Networks (RNN)as a generator and a discriminator to generate new trip data for transport vehicles, wherethe data is represented as a time series. This model is compared with a standalone RNNnetwork that does not have an adversarial counterpart. The result shows that the RNNmodel (without the adversarial counterpart) performed better than the GAN model dueto the difficulty that involves training and tuning GAN models. Deep Learning Machine Learning Statistics Generative Adversarial Networks Computer Science Generative Models Probability Theory and Statistics Sannolikhetsteori och statistik Computer Sciences Datavetenskap (datalogi)
77	[en] DEEP GENERATIVE MODELS FOR RESERVOIR DATA: AN APPLICATION IN SMART WELLS / [pt] MODELOS GENERATIVOS PROFUNDOS PARA DADOS DE RESERVATÓRIO: UMA APLICAÇÃO EM POÇOS INTELIGENTES ALLAN GURWICZ 27 May 2020 (has links) [pt] Simulação de reservatório, que por meio de equações complexas emula fluxo em modelos de reservatório, é primordial à indústria de Óleo e Gás. Estimando o comportamento do reservatório dadas diferentes condições de entrada, permite que especialistas otimizem diversos parâmetros na etapa de projeto de campos de petróleo. Entretanto, o tempo computacional necessário para simulações está diretamente correlacionado à complexidade do modelo, que cresce exponencialmente a cada dia que se passa, já que modelos mais detalhados são necessários dada a busca por maior refinamento e redução de incertezas. Deste modo, técnicas de otimização que poderiam significativamente melhorar os resultados de desenvolvimentos de campo podem se tornar inviáveis. Este trabalho propõe o uso de modelos generativos profundos para a geração de dados de reservatório, que podem então ser utilizados para múltiplos propósitos. Modelos generativos profundos são sistemas capazes de modelar estruturas de dados complexas, e que após treinamento robusto são capazes de amostrar dados que seguem a distribuição do conjunto de dados original. A presente aplicação foca em poços inteligentes, uma tecnologia de completação que traz diversas vantagens, dentre as quais uma melhor habilidade de monitoramento e gerenciamento de reservatórios, apesar de carregar um aumento significativo no investimento do projeto. Assim, essas otimizações previamente mencionadas se tornam indispensáveis, de forma a garantir a adoção da tecnologia, junto ao seu máximo retorno. De modo a tornar otimizações de controle de poços inteligentes viáveis dentro de um prazo razoável, redes generativas adversariais são aqui usadas para amostrar conjuntos de dados após um número relativamente pequeno de cenários simulados. Esses dados são então utilizados para o treinamento de aproximadores, algoritmos capazes de substituir o simulador de reservatório e acelerar consideravelmente metodologias de otimização. Estudos de caso foram realizados em modelos referência da indústria, tanto relativamente simples quanto complexos, comparando arquiteturas de redes e validando cada passo da metodologia. No modelo complexo, mais próximo de um cenário real, a metodologia foi capaz de reduzir o erro do aproximador de uma média de 18.93 por cento, para 9.71 por cento. / [en] Reservoir simulation, which via complex equations emulates flow in reservoir models, is paramount to the Oil e Gas industry. By estimating the behavior of the reservoir given different input conditions, it allows specialists to optimize various parameters in the oilfield project stage. Alas, the computational time needed for simulations is directly correlated to the complexity of the model, which grows exponentially with each passing day as more intricate and detailed reservoir models are needed, seeking better refinement and uncertainty reduction. As such, optimization techniques which could greatly improve the results of field developments may be made unfeasible. This work proposes the use of deep generative models for the generation of reservoir data, which may then be used for multiple purposes. Deep generative models are systems capable of modeling complex data structures, which after robust training are capable of sampling data following the same distribution of the original dataset. The present application focuses on smart wells, a technology for completions which brings about a plethora of advantages, among which the better ability for reservoir monitoring and management, although also carrying a significant increase in project investment. As such, these previously mentioned optimizations turn indispensable as to guarantee the adoption of the technology, along with its maximum possible return. As to make smart well control optimizations viable within a reasonable time frame, generative adversarial networks are here used to sample datasets after a relatively small number of simulated scenarios. These datasets are then used for the training of proxies, algorithms able to substitute the reservoir simulator and considerably speed up optimization methodologies. Case studies were done in both relatively simple and complex industry benchmark models, comparing network architectures and validating each step of the methodology. In the complex model, closest to a real-world scenario, the methodology was able to reduce the proxy error from an average of 18.93 percent, to 9.71 percent. [pt] SIMULACAO DE RESERVATORIOS [pt] REDES GENERATIVAS ADVERSARIAIS [pt] APRENDIZADO PROFUNDO [pt] POCOS INTELIGENTES [en] RESERVOIR SIMULATION [en] GENERATIVE ADVERSARIAL NETWORKS [en] DEEP LEARNING [en] SMART WELLS
78	Coverage Manifold Estimation in Cellular Networks via Conditional GANs Veni Goyal (18457590) 29 April 2024 (has links) <p dir="ltr">This research introduces an approach utilizing a novel conditional generative adversarial network (cGAN) tailored specifically for the prediction of cellular network coverage. In comparison to state-of-the-art method convolutional neural networks (CNNs), our cGAN model offers a significant improvement by translating base station locations within any Region-of-Interest (RoI) into precise coverage probability values within a designated region-of-evaluation (RoE). </p><p dir="ltr">By leveraging base station location data from diverse geographical and infrastructural landscapes spanning regions like India, the USA, Germany, and Brazil, our model demonstrates superior predictive performance compared to existing CNN-based approaches. Notably, the prediction error, as quantified by the L1 norm, is reduced by two orders of magnitude in comparison to state-of-the-art CNN models.</p><p dir="ltr">Furthermore, the coverage manifolds generated by our cGAN model closely resemble those produced by conventional simulation methods, indicating a substantial advancement in both prediction accuracy and visual fidelity. This achievement underscores the potential of cGANs in enhancing the precision and reliability of cellular network performance prediction, offering promising implications for optimizing network planning and deployment strategies.</p> Industrial engineering Networking and communications Deep learning
79	Analyzing the Negative Log-Likelihood Loss in Generative Modeling / Analys av log-likelihood-optimering inom generativa modeller Espuña I Fontcuberta, Aleix January 2022 (has links) Maximum-Likelihood Estimation (MLE) is a classic model-fitting method from probability theory. However, it has been argued repeatedly that MLE is inappropriate for synthesis applications, since its priorities are at odds with important principles of human perception, and that, e.g. Generative Adversarial Networks (GANs) are a more appropriate choice. In this thesis, we put these ideas to the test, and explore the effect of MLE in deep generative modelling, using image generation as our example application. Unlike previous studies, we apply a new methodology that allows us to isolate the effects of the training paradigm from several common confounding factors of variation, such as the model architecture and the properties of the true data distribution. The thesis addresses two main questions. First, we ask if models trained via Non-Saturating Generative Adversarial Networks (NSGANs) are capable of producing more realistic images than the exact same architecture trained by directly minimizing the Negative Log-Likelihood (NLL) loss function instead (which is equivalent to MLE). We compare the two training paradigms using the MNIST dataset and a normalizing-flow architecture known as Real NVP, which can explicitly represent a very broad family of density functions. We use the Fréchet Inception Distance (FID) as an algorithmic estimate of subjective image quality. Second, we also analyze how the NLL loss behaves in the presence of model misspecification, which is when the model architecture is not capable of representing the true data distribution, and compare the resulting training curves and performance to those produced by models without misspecification. In order to control for and study different degrees of model misspecification, we create a realistic-looking – but actually synthetic – toy version of the classic MNIST dataset. By this we mean that we create a machine-learning problem where the examples in the dataset look like MNIST, but in fact it have been generated by a Real NVP architecture with known weights, and therefore the true distribution that generated the image data is known. We are not aware of this type of large-scale, realistic-looking toy problem having been used in prior work. Our results show that, first, models trained via NLL perform unexpectedly well in terms of FID, and that a Real NVP trained via an NSGAN approach is unstable during training – even at the Nash equilibrium, which is the global optimum onto which the NSGAN training updates are supposed to converge. Second, the experiments on synthetic data show that models with different degrees of misspecification reach different NLL losses on the training set, but all of them exhibit qualitatively similar convergence behavior. However, looking at the validation NLL loss reveals an important overfitting effect due to the finite size of the synthetic dataset: The models that in theory are able to perfectly describe the true data distribution achieve worse validation NLL losses in practice than some misspecified models, whose reduced complexity acts as a regularizer that helps them generalize better. At the same time, we observe that overfitting has a much stronger negative effect on the validation NLL loss than on the image quality as measured by the FID score. We also conclude that models with too many parameters and degrees of freedom (overparameterized models) should be avoided, as they not only are slow and frequently unstable to train, even using the NLL loss, but they also overfit heavily and produce poorer images. Throughout the thesis, our results highlight the complex and non-intuitive relationship between the NLL loss and the perceptual image quality as measured by the FID score. / Maximum likelihood-metoden är en klassisk parameteruppskattningsmetod från sannolikhetsteori. Det hävdas dock ofta att maximum likelihood är ett olämpligt val för tillämpningar inom exempelvis ljud- och bildsyntes, eftersom metodens prioriteringar står i strid med viktiga principer inom mänsklig perception, och att t.ex. Generative Adversarial Networks (GANs) är ett mer perceptuellt lämpligt val. I den här avhandlingen testar vi dessa hypoteser och utforskar effekten av maximum likelihood i djupa generativa modeller, med bildsyntes som vår exempeltillämpning. Till skillnad från tidigare studier använder vi en ny metodik som gör att vi kan isolera effekterna av träningsparadigmen från flera vanliga störfaktorer, såsom modellarkitekturen och hur väl denna arkitektur svarar mot datats sanna fördelning. Avhandlingen tar upp två huvudfrågor. Först frågar vi oss huruvida modeller tränade via NSGAN (Non-Saturating Generative Adversarial Networks) producerar mer realistiska bilder än om exakt samma arkitektur istället tränas att direkt minimera målfunktionen Negativ Log-Likelihood (NLL). (Att minimera NLL är ekvivalent med maximum likelihood-metoden.) För att jämföra de två träningsparadigmerna använder vi datamängden MNIST samt en normalizing flow-arkitektur kallad Real NVP, vilken på ett explicit sätt kan representera en mycket bred familj av kontinuerliga fördelingsfunktioner. Vi använder också Fréchet Inception Distance (FID) som ett mått för att algoritmiskt uppskatta kvaliteten på syntetiserade bilder. För det andra analyserar vi också hur målfunktionen NLL beter sig för felspecificerade modeller, vilket är det fall när modellarkitekturen inte kan representera datas sanna sannolikhetsfördelning perfekt, och jämför resulterande träningskurvor och -prestanda med motsvarande resultat när vi tränar modeller utan felspecifikation. För att studera och utöva kontroll över olika grader av felspecificerade modeller skapar vi en realistisk – men i själva verket syntetisk – leksaksversion av MNIST. Med detta menar vi att vi skapar ett maskininlärningsproblem där exemplen i datamängden är visuellt mycket lika de i MNIST, men i själva verket alla är slumpgenererade från en Real NVP-arkitektur med kända modellparametrar (vikter), och således är den sanna fördelningen för detta syntetiska bilddatamaterialet känd. Vi är inte medvetna om att någon tidigare forskning använt ett realistiskt och storskaligt leksaksproblem enligt detta recept. Våra resultat visar, för det första, att modeller som tränats via NLL presterar oväntat bra i termer av FID, och att NSGAN-baserad träning av Real NVP-modeller är instabil – även om vi startar träningen vid Nashjämvikten, vilken är det globala optimum som NSGAN är tänkt att konvergera mot. För det andra visar experimenten på syntetiska data att modeller med olika grader av felspecifikation når olika NLL-värden på träningsmaterialet, men de uppvisar alla kvalitativt liknande konvergensbeteende. Om man tittar på NLL-värdena på valideringsdata syns dock en överanpassningseffekt, som härrör från den ändliga storleken på det syntetiska träningsdatamaterialet; specifikt ser vi att de modeller som i teorin perfekt kan beskriva den sanna datafördelningen i praktiken uppnår sämre NLL-värden på valideringsdata än vissa felspecificerade modeller. Den reducerade komplexiteten hos de senare regulariserar uppenbarligen modellerna och hjälper dem att generalisera bättre. Samtidigt noterar vi att överanpassning har en mycket mer uttalad negativ effekt på validerings-NLL än på bildkvalitetsmåttet FID. Vi drar också slutsatsen att modeller med alltför många parametrar och frihetsgrader (överparametriserade modeller) bör undvikas, eftersom de inte bara är långsamma och ofta instabila att träna, också om vi tränar baserat på NLL, men dessutom uppvisar kraftig överanpassning och sämre bildkvalitet. Som helhet belyser resultaten i detta examensarbete det komplexa och icke-intuitiva förhållandet mellan NLL/maximum likelihood och perceptuell bildkvalitet utvärderad med hjälp av FID. Generative modeling Normalizing flows Generative Adversarial Networks MaximumLikelihood Estimation Real Non-Volume Preserving flow Fréchet Inception Distance Misspecification Generativa metoder Normalizing flows Generative adversarial networks Maximum likelihood-metoden Real non-volume preserving flow Fréchet inception distance felspecificerade modeller Computer and Information Sciences Data- och informationsvetenskap
80	Advances in deep learning with limited supervision and computational resources Almahairi, Amjad 12 1900 (has links) Les réseaux de neurones profonds sont la pierre angulaire des systèmes à la fine pointe de la technologie pour une vaste gamme de tâches, comme la reconnaissance d'objets, la modélisation du langage et la traduction automatique. Mis à part le progrès important établi dans les architectures et les procédures de formation des réseaux de neurones profonds, deux facteurs ont été la clé du succès remarquable de l'apprentissage profond : la disponibilité de grandes quantités de données étiquetées et la puissance de calcul massive. Cette thèse par articles apporte plusieurs contributions à l'avancement de l'apprentissage profond, en particulier dans les problèmes avec très peu ou pas de données étiquetées, ou avec des ressources informatiques limitées. Le premier article aborde la question de la rareté des données dans les systèmes de recommandation, en apprenant les représentations distribuées des produits à partir des commentaires d'évaluation de produits en langage naturel. Plus précisément, nous proposons un cadre d'apprentissage multitâches dans lequel nous utilisons des méthodes basées sur les réseaux de neurones pour apprendre les représentations de produits à partir de textes de critiques de produits et de données d'évaluation. Nous démontrons que la méthode proposée peut améliorer la généralisation dans les systèmes de recommandation et atteindre une performance de pointe sur l'ensemble de données Amazon Reviews. Le deuxième article s'attaque aux défis computationnels qui existent dans l'entraînement des réseaux de neurones profonds à grande échelle. Nous proposons une nouvelle architecture de réseaux de neurones conditionnels permettant d'attribuer la capacité du réseau de façon adaptative, et donc des calculs, dans les différentes régions des entrées. Nous démontrons l'efficacité de notre modèle sur les tâches de reconnaissance visuelle où les objets d'intérêt sont localisés à la couche d'entrée, tout en maintenant une surcharge de calcul beaucoup plus faible que les architectures standards des réseaux de neurones. Le troisième article contribue au domaine de l'apprentissage non supervisé, avec l'aide du paradigme des réseaux antagoniste génératifs. Nous introduisons un cadre fléxible pour l'entraînement des réseaux antagonistes génératifs, qui non seulement assure que le générateur estime la véritable distribution des données, mais permet également au discriminateur de conserver l'information sur la densité des données à l'optimum global. Nous validons notre cadre empiriquement en montrant que le discriminateur est capable de récupérer l'énergie de la distribution des données et d'obtenir une qualité d'échantillons à la fine pointe de la technologie. Enfin, dans le quatrième article, nous nous attaquons au problème de l'apprentissage non supervisé à travers différents domaines. Nous proposons un modèle qui permet d'apprendre des transformations plusieurs à plusieurs à travers deux domaines, et ce, à partir des données non appariées. Nous validons notre approche sur plusieurs ensembles de données se rapportant à l'imagerie, et nous montrons que notre méthode peut être appliquée efficacement dans des situations d'apprentissage semi-supervisé. / Deep neural networks are the cornerstone of state-of-the-art systems for a wide range of tasks, including object recognition, language modelling and machine translation. In the last decade, research in the field of deep learning has led to numerous key advances in designing novel architectures and training algorithms for neural networks. However, most success stories in deep learning heavily relied on two main factors: the availability of large amounts of labelled data and massive computational resources. This thesis by articles makes several contributions to advancing deep learning, specifically in problems with limited or no labelled data, or with constrained computational resources. The first article addresses sparsity of labelled data that emerges in the application field of recommender systems. We propose a multi-task learning framework that leverages natural language reviews in improving recommendation. Specifically, we apply neural-network-based methods for learning representations of products from review text, while learning from rating data. We demonstrate that the proposed method can achieve state-of-the-art performance on the Amazon Reviews dataset. The second article tackles computational challenges in training large-scale deep neural networks. We propose a conditional computation network architecture which can adaptively assign its capacity, and hence computations, across different regions of the input. We demonstrate the effectiveness of our model on visual recognition tasks where objects are spatially localized within the input, while maintaining much lower computational overhead than standard network architectures. The third article contributes to the domain of unsupervised learning with the generative adversarial networks paradigm. We introduce a flexible adversarial training framework, in which not only the generator converges to the true data distribution, but also the discriminator recovers the relative density of the data at the optimum. We validate our framework empirically by showing that the discriminator is able to accurately estimate the true energy of data while obtaining state-of-the-art quality of samples. Finally, in the fourth article, we address the problem of unsupervised domain translation. We propose a model which can learn flexible, many-to-many mappings across domains from unpaired data. We validate our approach on several image datasets, and we show that it can be effectively applied in semi-supervised learning settings. Réseaux de Neurones Apprentissage Automatique Apprentissage Profond Modèles Génératifs Probabilistes Réseaux Accusatoires Génératifs Neural Networks Machine Learning Deep Learning Probabilistic Generative Models Generative Adversarial Networks

Search results