Global ETD Search

141	Investigation Of Short And Long Term Trends In The Eastern Mediterranean Aerosol Composition Ozturk, Fatma 01 January 2009 (has links) (PDF) Approximately 2000 daily aerosol samples were collected at Antalya (30&deg / 34&amp / #900 / 30.54 E, 36&deg / 47&amp / #8217 / 30.54N) on the Mediterranean coast of Turkey between 1993 and 2001. High volume PM10 sampler was used for the collection of samples on Whatman&amp / #8211 / 41 filters. Collected samples were analyzed by a combination of analytical techniques. Energy Dispersive X-Ray Fluorescence (EDXRF) and Inductively Coupled Plasma Mass Spectrometry (ICPMS) was used to measure trace element content of the collected samples from Li to U. Major ions, namely, SO42- and NO3-, were determined by employing Ion Chromatography (IC). Samples were analyzed in terms of their NH4+ contents by means of Colorimetry. Evaluation of short term trends of measured parameters have been shown that elements with marine and crustal origin are more episodic as compared to anthropogenic ones. Most of the parameters showed well defined seasonal cycles, for example, concentrations of crustal elements increased in summer season while winter concentrations of marine elements were considerably higher than associated values for summer. Seasonal Kendall statistic depicted that there was a decreasing trend for crustal elements such as Be, Co, Al, Na, Mg, K, Dy, Ho, Tm, Cs and Eu. Lead, As, Se and Ge were the anhtropogenic elements that decreasing trend was detected in the course of study period. Cluster and Residence time analysis were performed to find the origin of air masses arrving to Eastern Mediterranena Basin. It has been found that air masses reaching to our station resided more on Balkans and Eastern Europe. Positive Matrix Factorization (PMF) resolved eight factors influencing the chemical composition of Eastern Mediterranean aerosols as local dust, Saharan dust, oil combustion, coal combustion, crustal-anthropogenic mixed, sea salt, motor vehicle emission, and local Sb factor.
142	推薦系統資料插補改良法-電影推薦系統應用 / Improving recommendations through data imputation-with application for movie recommendation 楊智博, Yang, Chih Po Unknown Date (has links) 現今許多網路商店或電子商務將產品銷售給消費者的過程中，皆使用推薦系統的幫助來提高銷售量。如亞馬遜公司(Amazon)、Netflix，深入了解顧客的使用習慣，建構專屬的推薦系統並進行個性化的推薦商品給每一位顧客。推薦系統應用的技術分為協同過濾和內容過濾兩大類，本研究旨在探討協同過濾推薦系統中潛在因子模型方法，利用矩陣分解法找出評分矩陣。在Koren等人(2009)中，將矩陣分解法的演算法大致分為兩種，隨機梯度下降法(Stochastic gradient descent)與交替最小平方法(Alternating least squares)。本研究主要研究目的有三項，一為比較交替最小平方法與隨機梯度下降法的預測能力，二為兩種矩陣分解演算法在加入偏誤項後的表現，三為先完成交替最小平方法與隨機梯度下降法，以其預測值對原始資料之遺失值進行資料插補，再利用奇異值分解法對完整資料做矩陣分解，觀察其前後方法的差異。研究結果顯示，隨機梯度下降法所需的運算時間比交替最小平方法所需的運算時間少。另外，完成兩種矩陣分解演算法後，將預測值插補遺失值，進行奇異值分解的結果也顯示預測能力有提升。 / Recommender system has been largely used by Internet companies such Amazon and Netflix to make recommendations for Internet users. Techniques for recommender systems can be divided into content filtering approach and collaborative filtering approach. Matrix factorization is a popular method for collaborative filtering approach. It minimizes the object function through stochastic gradient descent and alternating least squares. This thesis has three goals. First, we compare the alternating least squares method and stochastic gradient descent method. Secondly, we compare the performance of matrix factorization method with and without the bias term. Thirdly, we combine singular value decomposition and matrix factorization. As expected, we found the stochastic gradient descent takes less time than the alternating least squares method, and the the matrix factorization method with bias term gives more accurate prediction. We also found that combining singular value decomposition with matrix factorization can improve the predictive accuracy. 推薦系統矩陣分解隨機梯度下降奇異值分解 Recommender systems Matrix Factorization Stochastic Gradient Descent Alternating Least Squares Singular Value Decomposition
143	Efficient end-to-end monitoring for fault management in distributed systems Feng, Dawei 27 March 2014 (has links) (PDF) In this dissertation, we present our work on fault management in distributed systems, with motivating application roots in monitoring fault and abrupt change of large computing systems like the grid and the cloud. Instead of building a complete a priori knowledge of the software and hardware infrastructures as in conventional detection or diagnosis methods, we propose to use appropriate techniques to perform end-to-end monitoring for such large scale systems, leaving the inaccessible details of involved components in a black box.For the fault monitoring of a distributed system, we first model this probe-based application as a static collaborative prediction (CP) task, and experimentally demonstrate the effectiveness of CP methods by using the max margin matrix factorization method. We further introduce active learning to the CP framework and exhibit its critical advantage in dealing with highly imbalanced data, which is specially useful for identifying the minority fault class.Further we extend the static fault monitoring to the sequential case by proposing the sequential matrix factorization (SMF) method. SMF takes a sequence of partially observed matrices as input, and produces predictions with information both from the current and history time windows. Active learning is also employed to SMF, such that the highly imbalanced data can be coped with properly. In addition to the sequential methods, a smoothing action taken on the estimation sequence has shown to be a practically useful trick for enhancing sequential prediction performance.Since the stationary assumption employed in the static and sequential fault monitoring becomes unrealistic in the presence of abrupt changes, we propose a semi-supervised online change detection (SSOCD) framework to detect intended changes in time series data. In this way, the static model of the system can be recomputed once an abrupt change is detected. In SSOCD, an unsupervised offline method is proposed to analyze a sample data series. The change points thus detected are used to train a supervised online model, which gives online decision about whether there is a change presented in the arriving data sequence. State-of-the-art change detection methods are employed to demonstrate the usefulness of the framework.All presented work is verified on real-world datasets. Specifically, the fault monitoring experiments are conducted on a dataset collected from the Biomed grid infrastructure within the European Grid Initiative, and the abrupt change detection framework is verified on a dataset concerning the performance change of an online site with large amount of traffic. Fault management Collaborative prediction End-to-end monitoring Sequential matrix factorization Sequential change detection Semi-supervised change detection
144	Comparison Of The Rural Atmosphere Aerosol Compositions At Different Parts Of Turkey Dogan, Guray 01 January 2005 (has links) (PDF) Long term data generated at four rural stations are compared to determine similarities and differences in aerosol compositions and factors contributing to observed differences at different regions in Turkey. The stations used in this study are located at Mediterranean coast (20 km to the west of Antalya city), Black Sea coast (20 km to the east of Amasra town), Central Anatolia (&Ccedil / ubuk, Ankara) and Northeastern part of the Anatolian Plateau (at Mt. Uludag). Data used in comparisons were generated in previous studies. However, some re-analysis of data were also performed / (1) to improve the similarities of the parameters compared and (2) to be able to apply recently-developed methodologies to data sets. Data from Mediterranean and Black Sea stations were identical in terms of parameters measured and were suitable for extensive comparison. However, fewer parameters were measured at &Ccedil / ubuk and Uludag stations, which limited the comparisons involving these two stations. Comparison included levels of major ions and elements, short-term and seasonal variations in concentrations, background (baseline) concentrations of elements, flow climatology of regions, correlations between elements, potential source areas affecting regions, and source types affecting chemical composition of particles. Comparison of levels of measured parameters in four regions showed that there are some differences in concentrations that arise from differences in the local characteristics of the sampling points. For example very high concentrations of elements such as Na and Cl in the Mediterranean region is attributed to closer proximity of the Antalya station to coast and not a general feature of the Mediterranean aerosol. There are also significant regional differences in the concentrations of measured elements and ions as well. Concentrations of anthropogenic elements are very similar at two coastal stations (Antalya and Amasra), but they are approximately a factor of two smaller at the two stations that are located on the Anatolian Plateau. This difference between coastal and high altitude plateau stations, which is common to all anthropogenic species, is attributed to different source regions and transport mechanisms influencing coastal regions and Anatolian Plateau. Some statistically significant differences were also observed in the temporal variations of elements and ions measured in different stations. The elements with crustal origin showed similar seasonal pattern at all stations, with higher concentrations in summer and lower concentrations in winter. This difference between summer and winter is attributed to suppression of re-suspension of crustal aerosol from wet or ice-covered surface soil in winter. Concentrations of anthropogenic elements, on the other hand, did not show a statistically significant seasonal trend at Amasra, &Ccedil / ubuk and Uludag stations, but they have higher concentrations during summer months at the Antalya station. This difference between Mediterranean aerosol and aerosol at the Central and Northern Turkey is due to influence of more local sources on &Ccedil / ubuk, Amasra and Uludag stations and domination of more distant source in determining aerosol composition at the Mediterranean region. A similar conclusion of strong influence of local sources on chemical composition of particles at the Central Anatolia was also suggested by the comparison of baseline concentrations in each station. General features in flow climatology (residence times of upper atmospheric air masses) in each region are found to be similar with more frequent flow from W, WNW, NW and NNW wind sectors. Since these are the sectors that include high emitting countries in Eastern and Western Europe and Russia, transport from these sectors are expected to bring pollution from both distant European countries and more local Balkan countries and western parts of Turkey. Flow climatology in stations showed small, but statistically significant, differences between summer and winter seasons. These variations suggested that the station at the Central Anatolia and Black Sea (&Ccedil / ubuk Amasra and Uludag stations) are affected from sources located at the Western Europe in winter season and from sources located at the Eastern Europe in summer. Mediterranean aerosol, on the other hand, are affected from sources at the Western Europe and do not show any seasonal differences. This variation in flow climatology between summer and winter seasons (and lack of variation at the Mediterranean station) is supported by the seasonal variation (and lack of variation at the Mediterranean station) in SO42-/NO3- ratio measured at the stations. Potential source contribution function (PSCF) values are calculated for selected elements and ions in each station. Statistical significance of calculated PSCF values is tested using bootstrapping technique. Results showed that specific grids at Russia and at Balkan countries are common source regions affecting concentrations of anthropogenic elements at all four regions in Turkey. However, each station is also affected from specific source regions as well. Aerosol composition at the Anatolian Plateau are affected from sources closer to the sampling points whereas Mediterranean and Black Sea aerosol are affected from source regions that farther away from the receptors. It should be noted that the same conclusion is also reached in comparison of seasonal patterns and baseline concentrations at these stations. Types of sources affecting aerosol composition at Black Sea, Mediterranean and Central Anatolia are also compared. Source types affecting atmospheric composition in these regions were calculated using positive matrix factorization (PMF). The results showed that aerosol at the Black Sea, Central Anatolia and Mediterranean atmosphere consists of 8, 6 and 7 components, respectively. Two of these components, namely a crustal component and a long-range transport component are common in all three stations. The chemical compositions of these common components are shown to the same within 95% statistical significance interval. Three factors, namely a fertilizer factor, which is highly enriched in NH4+ ion, a sea salt component and an arsenic factor are common in the Mediterranean and Black Sea aerosol but not observed at the Central Anatolia. Other factors found in the regions are specific for that region. TD Environmental Pollution 172-193.5
145	Uma estratégia para predição da taxa de aprendizagem do gradiente descendente para aceleração da fatoração de matrizes. / A strategy to predict the learning rate of the downward gradient for acceleration of matrix factorization. / Une stratégie pour prédire le taux d'apprentissage du gradient descendant pour l'accélération de la factorisation matricielle. NÓBREGA, Caio Santos Bezerra. 11 April 2018 (has links) Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-04-11T14:50:08Z No. of bitstreams: 1 CAIO SANTOS BEZERRA NÓBREGA - DISSERTAÇÃO PPGCC 2014..pdf: 983246 bytes, checksum: 5eca7651706ce317dc514ec2f1aa10c3 (MD5) / Made available in DSpace on 2018-04-11T14:50:08Z (GMT). No. of bitstreams: 1 CAIO SANTOS BEZERRA NÓBREGA - DISSERTAÇÃO PPGCC 2014..pdf: 983246 bytes, checksum: 5eca7651706ce317dc514ec2f1aa10c3 (MD5) Previous issue date: 2014-07-30 / Capes / Sugerir os produtos mais apropriados aos diversos tipos de consumidores não é uma tarefa trivial, apesar de ser um fator chave para aumentar satisfação e lealdade destes. Devido a esse fato, sistemas de recomendação têm se tornado uma ferramenta importante para diversas aplicações, tais como, comércio eletrônico, sites personalizados e redes sociais. Recentemente, a fatoração de matrizes se tornou a técnica mais bem sucedida de implementação de sistemas de recomendação. Os parâmetros do modelo de fatoração de matrizes são tipicamente aprendidos por meio de métodos numéricos, tal como o gradiente descendente. O desempenho do gradiente descendente está diretamente relacionada à conﬁguração da taxa de aprendizagem, a qual é tipicamente conﬁgurada para valores pequenos, com o objetivo de não perder um mínimo local. Consequentemente, o algoritmo pode levar várias iterações para convergir. Idealmente,é desejada uma taxa de aprendizagem que conduza a um mínimo local nas primeiras iterações, mas isto é muito difícil de ser realizado dada a alta complexidade do espaço de valores a serem pesquisados. Começando com um estudo exploratório em várias bases de dados de sistemas de recomendação, observamos que, para a maioria das bases, há um padrão linear entre a taxa de aprendizagem e o número de iterações necessárias para atingir a convergência. A partir disso, propomos utilizar modelos de regressão lineares simples para predizer, para uma base de dados desconhecida, um bom valor para a taxa de aprendizagem inicial. A ideia é estimar uma taxa de aprendizagem que conduza o gradiente descendenteaummínimolocalnasprimeirasiterações. Avaliamosnossatécnicaem8bases desistemasderecomendaçãoreaisecomparamoscomoalgoritmopadrão,oqualutilizaum valorﬁxoparaataxadeaprendizagem,ecomtécnicasqueadaptamataxadeaprendizagem extraídas da literatura. Nós mostramos que conseguimos reduzir o número de iterações até em 40% quando comparados à abordagem padrão. / Suggesting the most suitable products to different types of consumers is not a trivial task, despite being a key factor for increasing their satisfaction and loyalty. Due to this fact, recommender systems have be come an important tool for many applications, such as e-commerce, personalized websites and social networks. Recently, Matrix Factorization has become the most successful technique to implement recommendation systems. The parameters of this model are typically learned by means of numerical methods, like the gradient descent. The performance of the gradient descent is directly related to the conﬁguration of the learning rate, which is typically set to small values, in order to do not miss a local minimum. As a consequence, the algorithm may take several iterations to converge. Ideally, one wants to ﬁnd a learning rate that will lead to a local minimum in the early iterations, but this is very difﬁcult to achieve given the high complexity of search space. Starting with an exploratory study on several recommendation systems datasets, we observed that there is an over all linear relationship between the learnin grate and the number of iterations needed until convergence. From this, we propose to use simple linear regression models to predict, for a unknown dataset, a good value for an initial learning rate. The idea is to estimate a learning rate that drives the gradient descent as close as possible to a local minimum in the ﬁrst iteration. We evaluate our technique on 8 real-world recommender datasets and compared it with the standard Matrix Factorization learning algorithm, which uses a ﬁxed value for the learning rate over all iterations, and techniques fromt he literature that adapt the learning rate. We show that we can reduce the number of iterations until at 40% compared to the standard approach. Ciência da computação. Metodologias e técnicas da computação Ciência da computação Aceleração da fatoração de matrizes Gradiente descendente Sistemas de recomendação Taxa de aprendizagem - gradiente Modelos de regressão lineares Recomender system Matrix Factorization Recommendation systems Gradient descent système de recommandation factorisation matricielle Simple linear regression models
146	Sums and products of square-zero matrices Hattingh, Christiaan Johannes 03 1900 (has links) Which matrices can be written as sums or products of square-zero matrices? This question is the central premise of this dissertation. Over the past 25 years a signi - cant body of research on products and linear combinations of square-zero matrices has developed, and it is the aim of this study to present this body of research in a consolidated, holistic format, that could serve as a theoretical introduction to the subject. The content of the research is presented in three parts: rst results within the broader context of sums and products of nilpotent matrices are discussed, then products of square-zero matrices, and nally sums of square-zero matrices. / Mathematical Sciences / M. Sc. (Mathematics) Nilpotent matrix Quadratic matrix Square-zero matrix Matrix division Matrix factorization Sum decomposition Rational canonical form Smith canonical form Invariant factors Companion matrix Nonderogatory matrix Matrix similarity Matrix trace Field Field characteristic Algebraic closure Matrix polynomial 512.9434 Matrices
147	Analyse d'image hyperspectrale / Hyperspectral Image Analysis Faivre, Adrien 14 December 2017 (has links) Les travaux de thèse effectués dans le cadre de la convention Cifre conclue entrele laboratoire de mathématiques de Besançon et Digital Surf, entreprise éditrice dulogiciel d’analyse métrologique Mountains, portent sur les techniques d’analyse hyperspectrale.Sujet en plein essor, ces méthodes permettent d’exploiter des imagesissues de micro-spectroscopie, et en particulier de spectroscopie Raman. Digital Surfambitionne aujourd’hui de concevoir des solutions logicielles adaptées aux imagesproduites par ces appareils. Ces dernières se présentent sous forme de cubes de valeurs,où chaque pixel correspond à un spectre. La taille importante de ces données,appelées images hyperspectrales en raison du nombre important de mesures disponiblespour chaque spectre, obligent à repenser certains des algorithmes classiquesd’analyse d’image.Nous commençons par nous intéresser aux techniques de partitionnement de données.L’idée est de regrouper dans des classes homogènes les différents spectres correspondantà des matériaux similaires. La classification est une des techniques courammentutilisée en traitement des données. Cette tâche fait pourtant partie d’unensemble de problèmes réputés trop complexes pour une résolution pratique : les problèmesNP-durs. L’efficacité des différentes heuristiques utilisées en pratique était jusqu’àrécemment mal comprise. Nous proposons des argument théoriques permettantde donner des garanties de succès quand les groupes à séparer présentent certainespropriétés statistiques.Nous abordons ensuite les techniques de dé-mélange. Cette fois, il ne s’agit plus dedéterminer un ensemble de pixels semblables dans l’image, mais de proposer une interprétationde chaque pixel comme un mélange linéaire de différentes signatures spectrales,sensées émaner de matériaux purs. Cette déconstruction de spectres compositesse traduit mathématiquement comme un problème de factorisation en matrices positives.Ce problème est NP-dur lui aussi. Nous envisageons donc certaines relaxations,malencontreusement peu convaincantes en pratique. Contrairement au problème declassification, il semble très difficile de donner de bonnes garanties théoriques sur laqualité des résultats proposés. Nous adoptons donc une approche plus pragmatique,et proposons de régulariser cette factorisation en imposant des contraintes sur lavariation totale de chaque facteur.Finalement, nous donnons un aperçu d’autres problèmes d’analyse hyperspectralerencontrés lors de cette thèse, problèmes parmi lesquels figurent l’analyse en composantesindépendantes, la réduction non-linéaire de la dimension et la décompositiond’une image par rapport à une librairie regroupant un nombre important de spectresde référence. / This dissertation addresses hyperspectral image analysis, a set of techniques enabling exploitation of micro-spectroscopy images. Images produced by these sensors constitute cubic arrays, meaning that every pixel in the image is actually a spectrum.The size of these images, which is often quite large, calls for an upgrade for classical image analysis algorithms.We start out our investigation with clustering techniques. The main idea is to regroup every spectrum contained in a hyperspectralimage into homogeneous clusters. Spectrums taken across the image can indeed be generated by similar materials, and hence display spectral signatures resembling each other. Clustering is a commonly used method in data analysis. It belongs nonetheless to a class of particularly hard problems to solve, named NP-hard problems. The efficiency of a few heuristics used in practicewere poorly understood until recently. We give theoretical arguments guaranteeing success when the groups studied displaysome statistical property.We then study unmixing techniques. The objective is no longer to decide to which class a pixel belongs, but to understandeach pixel as a mix of basic signatures supposed to arise from pure materials. The mathematical underlying problem is again NP-hard.After studying its complexity, and suggesting two lengthy relaxations, we describe a more practical way to constrain the problemas to obtain regularized solutions.We finally give an overview of other hyperspectral image analysis methods encountered during this thesis, amongst whomare independent component analysis, non-linear dimension reduction, and regression against a spectrum library. Traitement d'images Relaxation SDP Factorsation matrices positives Imagerie hyperspectrale Partitionnement des données Fractorisation par matrice des données Régularisation par la variation totale Hyperspectral imaging Image Analysis Clustering Non-negative matrix factorization Total variation regularization 510
148	Détection, localisation et quantification de déplacements par capteurs à fibre optique / Detection, localization and quantification of displacements thanks to optical fiber sensors Buchoud, Edouard 13 October 2014 (has links) Pour l’auscultation d’ouvrages, les capteurs à fibre optique sont généralement utilisés puisqu’ils présentent l’avantage de fournir des mesures réparties. Plus particulièrement, le capteur basé sur la technologie Brillouin permet d’acquérir un profil de fréquence Brillouin, sensible à la température et la déformation dans une fibre optique sur une dizaine de kilomètres avec un pas de l’ordre de la dizaine de centimètres. La première problématique est d’obtenir un profil centimétrique sur la même longueur d’auscultation. Nous y répondons en s’appuyant sur des méthodes de séparation de sources, de déconvolution et de résolution de problèmes inverses. Ensuite, nous souhaitons estimer la déformation athermique dans l’ouvrage. Pour cela, plusieurs algorithmes de filtrage adaptatif sont comparés. Finalement, un procédé pour quantifier le déplacement de l’ouvrage à partir des mesures de déformation est proposé. Toutes ces méthodes sont testés sur des données simulées et réelles acquises dans des conditions contrôlées. / For structural health monitoring, optical fiber sensors are mostly used thanks their capacity to provide distributed measurements. Based on the principle of Brillouin scattering, optical fiber sensors measure Brillouin frequency profile, sensitive to strain and temperature into the optical fiber, with a meter spatial resolution over several kilometers. The first problem is to obtain a centimeter spatial resolution with the same sensing length. To solve it, source separation, deconvolution and resolution of inverse problem methodologies are used. Then, the athermal strain into the structure is searched. Several algorithms based on adaptative filter are tested to correct the thermal effect on strain measurements. Finally, several methods are developed to quantify structure displacements from the athermal strain measurements. They have been tested on simulated and controlled-conditions data Suivi de l’état des structures Capteur à fibre optique Diffusion Brillouin Séparation de sources Factorisation en matrices non négatives Problème inverse Filtrage adaptatif Structural Health Monitoring Optical fiber sensor Brillouin backscattering Non negative matrix factorization, Inverse problem Adaptative filter 550
149	Multivariate analysis of high-throughput sequencing data / Analyses multivariées de données de séquençage à haut débit Durif, Ghislain 13 December 2016 (has links) L'analyse statistique de données de séquençage à haut débit (NGS) pose des questions computationnelles concernant la modélisation et l'inférence, en particulier à cause de la grande dimension des données. Le travail de recherche dans ce manuscrit porte sur des méthodes de réductions de dimension hybrides, basées sur des approches de compression (représentation dans un espace de faible dimension) et de sélection de variables. Des développements sont menés concernant la régression "Partial Least Squares" parcimonieuse (supervisée) et les méthodes de factorisation parcimonieuse de matrices (non supervisée). Dans les deux cas, notre objectif sera la reconstruction et la visualisation des données. Nous présenterons une nouvelle approche de type PLS parcimonieuse, basée sur une pénalité adaptative, pour la régression logistique. Cette approche sera utilisée pour des problèmes de prédiction (devenir de patients ou type cellulaire) à partir de l'expression des gènes. La principale problématique sera de prendre en compte la réponse pour écarter les variables non pertinentes. Nous mettrons en avant le lien entre la construction des algorithmes et la fiabilité des résultats.Dans une seconde partie, motivés par des questions relatives à l'analyse de données "single-cell", nous proposons une approche probabiliste pour la factorisation de matrices de comptage, laquelle prend en compte la sur-dispersion et l'amplification des zéros (caractéristiques des données single-cell). Nous développerons une procédure d'estimation basée sur l'inférence variationnelle. Nous introduirons également une procédure de sélection de variables probabiliste basée sur un modèle "spike-and-slab". L'intérêt de notre méthode pour la reconstruction, la visualisation et le clustering de données sera illustré par des simulations et par des résultats préliminaires concernant une analyse de données "single-cell". Toutes les méthodes proposées sont implémentées dans deux packages R: plsgenomics et CMF / The statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing Statistiques computationnelles Données en grande dimension Réduction de dimension Compression Sélection de Variables Régression logistique Partial Least Squares parcimonieuse Factorisation probabiliste de matrices Computational Statistics High-dimensional data Dimension reduction Compression Variable selection Logistic regression Sparse Partial Least Squares Probabilistic matrix factorization 570.15
150	Neuronové sítě pro doporučování knih / Deep Book Recommendation Gráca, Martin January 2018 (has links) This thesis deals with the field of recommendation systems using deep neural networks and their use in book recommendation. There are the main traditional recommender systems analysed and their representations are summarized, as well as systems with more advanced techniques based on machine learning. The core of the thesis is to use convolutional neural networks for natural language processing and create a hybrid book recommendation system. Suggested system includes matrix factorization and make recommendation based on user ratings and book metadata, including texts descriptions. I designed two models, one with bag-of-words technique and one with convolutional neural network. Both of them defeat baseline methods. On the created data set, that was created from the Goodreads, model with CNN beats model with BOW.

Search results