• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 90
  • 20
  • 14
  • 13
  • 5
  • 4
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 170
  • 170
  • 56
  • 41
  • 38
  • 37
  • 31
  • 28
  • 27
  • 26
  • 26
  • 26
  • 23
  • 22
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Temporal Variation In Aerosol Composition At Northwestern Turkey

Genc Tokgoz, D. Deniz 01 February 2013 (has links) (PDF)
Daily aerosol samples (PM) were collected at a rural station, which is 5 km away from the Turkish-Bulgarian border between April 2006 and March 2008. Aerosol samples were analyzed for elements by ICPMS, ions by IC and black carbon by aethalometer to provide a multi-species aerosol data set, which can represent aerosol population for Northwestern Turkey and Eastern Europe. Average concentration of SO42-, NO3- and NH4+ was 5.8, 2.9 and 2.0 &mu / g m-3, respectively, while total aerosol mass was 66 &mu / g m-3. Seasonal variation of crustal species had maxima in summer, while most of the anthropogenic species had maxima in winter. Rainfall was found as the only local meteorological parameter affecting aerosols concentrations. The dominant sectors of air masses arriving the Northwestern Turkey were northeast in summer and west-northwest in winter. Air masses were classified into five clusters regarding their wind speed and direction. Most species indicated significant differences between clusters. The influence of forest fires in Ukraine and Russian Federation was identified by cluster analysis using soluble K as tracer. Source apportionment of PM was carried out by EPA PMF model and five sources were resolved. Crustal emissions were found to be the major contributor to PM (41%). The second largest source was distant anthropogenic sources with a contribution of 26%. Traffic was also a remarkable source with 16% contribution. Sea salt and stationary combustion sources accounted for 9% and 8% of PM, respectively. Potential source regions of resolved sources were determined by potential source contribution function (PSCF).
72

Local approaches for collaborative filtering

Lee, Joonseok 21 September 2015 (has links)
Recommendation systems are emerging as an important business application as the demand for personalized services in E-commerce increases. Collaborative filtering techniques are widely used for predicting a user's preference or generating a list of items to be recommended. In this thesis, we develop several new approaches for collaborative filtering based on model combination and kernel smoothing. Specifically, we start with an experimental study that compares a wide variety of CF methods under different conditions. Based on this study, we formulate a combination model similar to boosting but where the combination coefficients are functions rather than constant. In another contribution we formulate and analyze a local variation of matrix factorization. This formulation constructs multiple local matrix factorization models and then combines them into a global model. This formulation is based on the local low-rank assumption, a slightly different but more plausible assumption about the rating matrix. We apply this assumption to both rating prediction and ranking problems, with both empirical validations and theoretical analysis. We contribute with this thesis in four aspects. First, the local approaches we present significantly improve the accuracy of recommendations both in rating prediction and ranking problems. Second, with the more realistic local low-rank assumption, we fundamentally change the underlying assumption for matrix factorization-based recommendation systems. Third, we present highly efficient and scalable algorithms which take advantage of parallelism, suited for recent large scale datasets. Lastly, we provide an open source software implementing the local approaches in this thesis as well as many other recent recommendation algorithms, which can be used both in research and production.
73

Novel document representations based on labels and sequential information

Kim, Seungyeon 21 September 2015 (has links)
A wide variety of text analysis applications are based on statistical machine learning techniques. The success of those applications is critically affected by how we represent a document. Learning an efficient document representation has two major challenges: sparsity and sequentiality. The sparsity often causes high estimation error, and text's sequential nature, interdependency between words, causes even more complication. This thesis presents novel document representations to overcome the two challenges. First, I employ label characteristics to estimate a compact document representation. Because label attributes implicitly describe the geometry of dense subspace that has substantial impact, I can effectively resolve the sparsity issue while only focusing the compact subspace. Second, while modeling a document as a joint or conditional distribution between words and their sequential information, I can efficiently reflect sequential nature of text in my document representations. Lastly, the thesis is concluded with a document representation that employs both labels and sequential information in a unified formulation. The following four criteria are utilized to evaluate the goodness of representations: how close a representation is to its original data, how strongly a representation can be distinguished from each other, how easy to interpret a representation by a human, and how much computational effort is needed for a representation. While pursuing those good representation criteria, I was able to obtain document representations that are closer to the original data, stronger in discrimination, and easier to be understood than traditional document representations. Efficient computation algorithms make the proposed approaches largely scalable. This thesis examines emotion prediction, temporal emotion analysis, modeling documents with edit histories, locally coherent topic modeling, and text categorization tasks for possible applications.
74

Dictionary learning methods for single-channel source separation

Lefèvre, Augustin 03 October 2012 (has links) (PDF)
In this thesis we provide three main contributions to blind source separation methods based on NMF. Our first contribution is a group-sparsity inducing penalty specifically tailored for Itakura-Saito NMF. In many music tracks, there are whole intervals where only one source is active at the same time. The group-sparsity penalty we propose allows to blindly indentify these intervals and learn source specific dictionaries. As a consequence, those learned dictionaries can be used to do source separation in other parts of the track were several sources are active. These two tasks of identification and separation are performed simultaneously in one run of group-sparsity Itakura-Saito NMF. Our second contribution is an online algorithm for Itakura-Saito NMF that allows to learn dictionaries on very large audio tracks. Indeed, the memory complexity of a batch implementation NMF grows linearly with the length of the recordings and becomes prohibitive for signals longer than an hour. In contrast, our online algorithm is able to learn NMF on arbitrarily long signals with limited memory usage. Our third contribution deals user informed NMF. In short mixed signals, blind learning becomes very hard and sparsity do not retrieve interpretable dictionaries. Our contribution is very similar in spirit to inpainting. It relies on the empirical fact that, when observing the spectrogram of a mixture signal, an overwhelming proportion of it consists in regions where only one source is active. We describe an extension of NMF to take into account time-frequency localized information on the absence/presence of each source. We also investigate inferring this information with tools from machine learning.
75

High Performance Parallel Algorithms for Tensor Decompositions / Algorithmes Parallèles pour les Décompositions des Tenseurs

Kaya, Oguz 15 September 2017 (has links)
La factorisation des tenseurs est au coeur des méthodes d'analyse des données massives multidimensionnelles dans de nombreux domaines, dont les systèmes de recommandation, les graphes, les données médicales, le traitement du signal, la chimiométrie, et bien d'autres.Pour toutes ces applications, l'obtention rapide de la décomposition des tenseurs est cruciale pour pouvoir traiter manipuler efficacement les énormes volumes de données en jeu.L'objectif principal de cette thèse est la conception d'algorithmes pour la décomposition de tenseurs multidimensionnels creux, possédant de plusieurs centaines de millions à quelques milliards de coefficients non-nuls. De tels tenseurs sont omniprésents dans les applications citées plus haut.Nous poursuivons cet objectif via trois approches.En premier lieu, nous proposons des algorithmes parallèles à mémoire distribuée, comprenant des schémas de communication point-à-point optimisés, afin de réduire les coûts de communication. Ces algorithmes sont indépendants du partitionnement des éléments du tenseur et des matrices de faible rang. Cette propriété nous permet de proposer des stratégies de partitionnement visant à minimiser le coût de communication tout en préservant l'équilibrage de charge entre les ressources. Nous utilisons des techniques d'hypergraphes pour analyser les paramètres de calcul et de communication de ces algorithmes, ainsi que des outils de partitionnement d'hypergraphe pour déterminer des partitions à même d'offrir un meilleur passage à l'échelle. Deuxièmement, nous étudions la parallélisation sur plate-forme à mémoire partagée de ces algorithmes. Dans ce contexte, nous déterminons soigneusement les tâches de calcul et leur dépendances, et nous les exprimons en termes d'une structure de données idoine, et dont la manipulation permet de révéler le parallélisme intrinsèque du problème. Troisièmement, nous présentons un schéma de calcul en forme d'arbre binaire pour représenter les noyaux de calcul les plus coûteux des algorithmes, comme la multiplication du tenseur par un ensemble de vecteurs ou de matrices donnés. L'arbre binaire permet de factoriser certains résultats intermédiaires, et de les ré-utiliser au fil du calcul. Grâce à ce schéma, nous montrons comment réduire significativement le nombre et le coût des multiplications tenseur-vecteur et tenseur-matrice, rendant ainsi la décomposition du tenseur plus rapide à la fois pour la version séquentielle et la version parallèle des algorithmes.Enfin, le reste de la thèse décrit deux extensions sur des thèmes similaires. La première extension consiste à appliquer le schéma d'arbre binaire à la décomposition des tenseurs denses, avec une analyse précise de la complexité du problème et des méthodes pour trouver la structure arborescente qui minimise le coût total. La seconde extension consiste à adapter les techniques de partitionnement utilisées pour la décomposition des tenseurs creux à la factorisation des matrices non-négatives, problème largement étudié et pour lequel nous obtenons des algorithmes parallèles plus efficaces que les meilleurs actuellement connus.Tous les résultats théoriques de cette thèse sont accompagnés d'implémentations parallèles,aussi bien en mémoire partagée que distribuée. Tous les algorithmes proposés, avec leur réalisation sur plate-forme HPC, contribuent ainsi à faire de la décomposition de tenseurs un outil prometteur pour le traitement des masses de données actuelles et à venir. / Tensor factorization has been increasingly used to analyze high-dimensional low-rank data ofmassive scale in numerous application domains, including recommender systems, graphanalytics, health-care data analysis, signal processing, chemometrics, and many others.In these applications, efficient computation of tensor decompositions is crucial to be able tohandle such datasets of high volume. The main focus of this thesis is on efficient decompositionof high dimensional sparse tensors, with hundreds of millions to billions of nonzero entries,which arise in many emerging big data applications. We achieve this through three majorapproaches.In the first approach, we provide distributed memory parallel algorithms with efficientpoint-to-point communication scheme for reducing the communication cost. These algorithmsare agnostic to the partitioning of tensor elements and low rank decomposition matrices, whichallow us to investigate effective partitioning strategies for minimizing communication cost whileestablishing computational load balance. We use hypergraph-based techniques to analyze computational and communication requirements in these algorithms, and employ hypergraphpartitioning tools to find suitable partitions that provide much better scalability.Second, we investigate effective shared memory parallelizations of these algorithms. Here, we carefully determine unit computational tasks and their dependencies, and express them using aproper data structure that exposes the parallelism underneath.Third, we introduce a tree-based computational scheme that carries out expensive operations(involving the multiplication of the tensor with a set of vectors or matrices, found at the core ofthese algorithms) faster by factoring out and storing common partial results and effectivelyre-using them. With this computational scheme, we asymptotically reduce the number oftensor-vector and -matrix multiplications for high dimensional tensors, and thereby rendercomputing tensor decompositions significantly cheaper both for sequential and parallelalgorithms.Finally, we diversify this main course of research with two extensions on similar themes.The first extension involves applying the tree-based computational framework to computingdense tensor decompositions, with an in-depth analysis of computational complexity andmethods to find optimal tree structures minimizing the computational cost. The second workfocuses on adapting effective communication and partitioning schemes of our parallel sparsetensor decomposition algorithms to the widely used non-negative matrix factorization problem,through which we obtain significantly better parallel scalability over the state of the artimplementations.We point out that all theoretical results in the thesis are nicely corroborated by parallelexperiments on both shared-memory and distributed-memory platforms. With these fastalgorithms as well as their tuned implementations for modern HPC architectures, we rendertensor and matrix decomposition algorithms amenable to use for analyzing massive scaledatasets.
76

Blind inverse imaging with positivity constraints / Inversion aveugle d'images avec contraintes de positivité

Lecharlier, Loïc 09 September 2014 (has links)
Dans les problèmes inverses en imagerie, on suppose généralement connu l’opérateur ou matrice décrivant le système de formation de l’image. De façon équivalente pour un système linéaire, on suppose connue sa réponse impulsionnelle. Toutefois, ceci n’est pas une hypothèse réaliste pour de nombreuses applications pratiques pour lesquelles cet opérateur n’est en fait pas connu (ou n’est connu qu’approximativement). On a alors affaire à un problème d’inversion dite “aveugle”. Dans le cas de systèmes invariants par translation, on parle de “déconvolution aveugle” car à la fois l’image ou objet de départ et la réponse impulsionnelle doivent être estimées à partir de la seule image observée qui résulte d’une convolution et est affectée d’erreurs de mesure. Ce problème est notoirement difficile et pour pallier les ambiguïtés et les instabilités numériques inhérentes à ce type d’inversions, il faut recourir à des informations ou contraintes supplémentaires, telles que la positivité qui s’est avérée un levier de stabilisation puissant dans les problèmes d’imagerie non aveugle. La thèse propose de nouveaux algorithmes d’inversion aveugle dans un cadre discret ou discrétisé, en supposant que l’image inconnue, la matrice à inverser et les données sont positives. Le problème est formulé comme un problème d’optimisation (non convexe) où le terme d’attache aux données à minimiser, modélisant soit le cas de données de type Poisson (divergence de Kullback-Leibler) ou affectées de bruit gaussien (moindres carrés), est augmenté par des termes de pénalité sur les inconnues du problème. La stratégie d’optimisation consiste en des ajustements alternés de l’image à reconstruire et de la matrice à inverser qui sont de type multiplicatif et résultent de la minimisation de fonctions coût “surrogées” valables dans le cas positif. Le cadre assez général permet d’utiliser plusieurs types de pénalités, y compris sur la variation totale (lissée) de l’image. Une normalisation éventuelle de la réponse impulsionnelle ou de la matrice est également prévue à chaque itération. Des résultats de convergence pour ces algorithmes sont établis dans la thèse, tant en ce qui concerne la décroissance des fonctions coût que la convergence de la suite des itérés vers un point stationnaire. La méthodologie proposée est validée avec succès par des simulations numériques relatives à différentes applications telle que la déconvolution aveugle d'images en astronomie, la factorisation en matrices positives pour l’imagerie hyperspectrale et la déconvolution de densités en statistique. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
77

Nízko-dimenzionální faktorizace pro "End-To-End" řečové systémy / Low-Dimensional Matrix Factorization in End-To-End Speech Recognition Systems

Gajdár, Matúš January 2020 (has links)
The project covers automatic speech recognition with neural network training using low-dimensional matrix factorization. We are describing time delay neural networks with factorization (TDNN-F) and without it (TDNN) in Pytorch language. We are comparing the implementation between Pytorch and Kaldi toolkit, where we achieve similar results during experiments with various network architectures. The last chapter describes the impact of a low-dimensional matrix factorization on End-to-End speech recognition systems and also a modification of the system with TDNN(-F) networks. Using specific network settings, we were able to achieve better results with systems using factorization. Additionally, we reduced the complexity of training by decreasing network parameters with the use of TDNN(-F) networks.
78

Apprentissage de représentations en imagerie fonctionnelle / Learning representations from functional MRI data

Mensch, Arthur 28 September 2018 (has links)
Grâce aux avancées technologiques dans le domaine de l'imagerie fonctionnelle cérébrale, les neurosciences cognitives accumulent une grande quantité de cartes spatiales décrivant de manière quantitative l'activité neuronale suscitée dans le cerveau humain en réponse à des tâches ou des stimuli spécifiques, ou de manière spontanée. Dans cette thèse, nous nous intéressons particulièrement aux données issues de l'imagerie par résonance magnétique fonctionnelle (IRMf), que nous étudions dans un cadre d'apprentissage statistique. Notre objectif est d'apprendre des modèles d'activité cérébrale à partir des données. Nous proposons différentes nouvelles manières de profiter de la grande quantité de données IRMf disponible. Tout d'abord, nous considérons les données d'IRMf de repos, que nous traitons grâce à des méthodes de factorisation de matrices. Nous présentons de nouvelles méthodes pour calculer en un temps raisonnable une factorisation parcimonieuse de matrices constituées de centaines d'enregistrements d'IRMf. Cela nous permet d'extraire des réseaux fonctionnels à partir de données d'une envergure inédite. Notre méthode principale introduit une réduction aléatoire de la dimension des données dans une boucle d'apprentissage en ligne. L'algorithme proposé converge plus de 10 fois plus vite que les meilleures méthodes existantes, pour différentes configurations et sur plusieurs jeux de données. Nous effectuons une vaste validation expérimentale de notre approche de sous-échantillonnage aléatoire. Nous proposons une étude théorique des propriétés de convergence de notre algorithme. Dans un second temps, nous nous intéressons aux données d'IRMf d'activation. Nous démontrons comment agréger différents études acquises suivant des protocoles distincts afin d'apprendre des modèles joints de décodage plus justes et interprétables. Notre modèle multi-études apprend à réduire la dimension des images cérébrales en entrée en même temps qu'il apprend à les classifier, pour chacune des études, à partir de leurs représentations réduites. Cela suscite un transfert d'information entre les études. En conséquence, notre modèle multi-étude est plus performant que les modèles de décodage appris sur chaque étude séparément. Notre approche identifie une représentation universellement pertinente de l'activité cérébrale, supportée par un petit nombre de réseaux optimisés pour l'identification de tâches. / Thanks to the advent of functional brain-imaging technologies, cognitive neuroscience is accumulating maps of neural activity responses to specific tasks or stimuli, or of spontaneous activity. In this work, we consider data from functional Magnetic Resonance Imaging (fMRI), that we study in a machine learning setting: we learn a model of brain activity that should generalize on unseen data. After reviewing the standard fMRI data analysis techniques, we propose new methods and models to benefit from the recently released large fMRI data repositories. Our goal is to learn richer representations of brain activity. We first focus on unsupervised analysis of terabyte-scale fMRI data acquired on subjects at rest (resting-state fMRI). We perform this analysis using matrix factorization. We present new methods for running sparse matrix factorization/dictionary learning on hundreds of fMRI records in reasonable time. Our leading approach relies on introducing randomness in stochastic optimization loops and provides speed-up of an order of magnitude on a variety of settings and datasets. We provide an extended empirical validation of our stochastic subsampling approach, for datasets from fMRI, hyperspectral imaging and collaborative filtering. We derive convergence properties for our algorithm, in a theoretical analysis that reaches beyond the matrix factorization problem. We then turn to work with fMRI data acquired on subject undergoing behavioral protocols (task fMRI). We investigate how to aggregate data from many source studies, acquired with many different protocols, in order to learn more accurate and interpretable decoding models, that predicts stimuli or tasks from brain maps. Our multi-study shared-layer model learns to reduce the dimensionality of input brain images, simultaneously to learning to decode these images from their reduced representation. This fosters transfer learning in between studies, as we learn the undocumented cognitive common aspects that the many fMRI studies share. As a consequence, our multi-study model performs better than single-study decoding. Our approach identifies universally relevant representation of brain activity, supported by a few task-optimized networks learned during model fitting. Finally, on a related topic, we show how to use dynamic programming within end-to-end trained deep networks, with applications in natural language processing.
79

Characterisation of a developer’s experience fields using topic modelling

Déhaye, Vincent January 2020 (has links)
Finding the most relevant candidate for a position represents an ubiquitous challenge for organisations. It can also be arduous for a candidate to explain on a concise resume what they have experience with. Due to the fact that the candidate usually has to select which experience to expose and filter out some of them, they might not be detected by the person carrying out the search, whereas they were indeed having the desired experience. In the field of software engineering, developing one's experience usually leaves traces behind: the code one produced. This project explores approaches to tackle the screening challenges with an automated way of extracting experience directly from code by defining common lexical patterns in code for different experience fields, using topic modeling. Two different techniques were compared. On one hand, Latent Dirichlet Allocation (LDA) is a generative statistical model which has proven to yield good results in topic modeling. On the other hand Non-Negative Matrix Factorization (NMF) is simply a singular value decomposition of a matrix representing the code corpus as word counts per piece of code.The code gathered consisted of 30 random repositories from all the collaborators of the open-source Ruby-on-Rails project on GitHub, which was then applied common natural language processing transformation steps. The results of both techniques were compared using respectively perplexity for LDA, reconstruction error for NMF and topic coherence for both. The two first represent how well the data could be represented by the topics produced while the later estimates the hanging and fitting together of the elements of a topic, and can depict human understandability and interpretability. Given that we did not have any similar work to benchmark with, the performance of the values obtained is hard to assess scientifically. However, the method seems promising as we would have been rather confident in assigning labels to 10 of the topics generated. The results imply that one could probably use natural language processing methods directly on code production in order to extend the detected fields of experience of a developer, with a finer granularity than traditional resumes and with fields definition evolving dynamically with the technology.
80

Machine Learning Approaches to Reveal Discrete Signals in Gene Expression

Changlin Wan (12450321) 24 April 2022 (has links)
<p>Gene expression is an intricate process that determines different cell types and functions in metazoans, where most of its regulation is communicated through discrete signals, like whether the DNA helix is open, whether an enzyme binds with its target, etc. Understanding the regulation signals of the selective expression process is essential to the full comprehension of biological mechanism and complicated biological systems. In this research, we seek to reveal the discrete signals in gene expression by utilizing novel machine learning approaches. Specifically, we focus on two types of data chromatin conformation capture (3C) and single cell RNA sequencing (scRNA-seq). To identify potential regulators, we utilize a new hypergraph neural network to predict genome interactions, where we find the gene co-regulation may result from the shared enhancer element. To reveal the discrete expression state from scRNA-seq data, we propose a novel model called LTMG that considered the biological noise and showed better goodness of fitting compared with existing models. Next, we applied Boolean matrix factorization to find the co-regulation modules from the identified expression states, where we revealed the general property in cancer cells across different patients. Lastly, to find more reliable modules, we analyze the bias in the data and proposed BIND, the first algorithm to quantify the column- and row-wise bias in binary matrix.</p>

Page generated in 0.2528 seconds