Spelling suggestions: "subject:"1atrix factorization"" "subject:"béatrix factorization""
51 |
Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUsNisa, Israt 02 October 2019 (has links)
No description available.
|
52 |
Complexity evaluation of CNNs in tightly coupled hybrid recommender systems / Komplexitetsanalys av faltningsnätverk i tätt kopplade hybridrekommendationssystemIngverud, Patrik January 2018 (has links)
In this report we evaluated how the complexity of a Convolutional Neural Network (CNN), in terms of number of filters, size of filters and dropout, affects the performance on the rating prediction accuracy in a tightly coupled hybrid recommender system. We also evaluated the effect on the rating prediction accuracy for pretrained CNNs in comparison to non-pretrained CNNs. We found that a less complex model, i.e. smaller filters and less number of filters, showed trends of better performance. Less regularization, in terms of dropout, had trends of better performance for the less complex models. Regarding the comparison of the pretrained models and non-pretrained models the experimental results were almost identical for the two denser datasets while pretraining had slightly worse performance on the sparsest dataset. / I denna rapport utvärderade vi komplexiteten på ett neuralt faltningsnätverk (eng. Convolutional Neural Network) i form av antal filter, storleken på filtren och regularisering, i form av avhopp (eng. dropout), för att se hur dessa hyperparametrar påverkade träffsäkerheten för rekommendationer i ett hybridrekommendationssystem. Vi utvärderade även hur förträning av det neurala faltningsnätverket påverkade träffsäkerheten för rekommendationer i jämförelse med ett icke förtränat neuralt faltningsnätverk. Resultaten visade trender på att en mindre komplex modell, det vill säga mindre och färre filter, gav bättre resultat. Även mindre regularisering, i form av avhopp, gav bättre resultat för mindre komplexa modeller. Gällande jämförelsen med förtränade modeller och icke förtränade modeller visade de experimentella resultaten nästan ingen skillnad för de två kompaktare dataseten medan förträning gav lite sämre resultat på det glesaste datasetet.
|
53 |
Machine Learning Approaches to Historic Music RestorationColeman, Quinn 01 March 2021 (has links) (PDF)
In 1889, a representative of Thomas Edison recorded Johannes Brahms playing a piano arrangement of his piece titled “Hungarian Dance No. 1”. This recording acts as a window into how musical masters played in the 19th century. Yet, due to years of damage on the original recording medium of a wax cylinder, it was un-listenable by the time it was digitized into WAV format. This thesis presents machine learning approaches to an audio restoration system for historic music, which aims to convert this poor-quality Brahms piano recording into a higher quality one. Digital signal processing is paired with two machine learning approaches: non-negative matrix factorization and deep neural networks. Our results show the advantages and disadvantages of our approaches, when we compare them to a benchmark restoration of the same recording made by the Center for Computer Research in Music and Acoustics at Stanford University. They also show how this system provides the restoration potential for a wide range of historic music artifacts like this recording, requiring minimal overhead made possible by machine learning. Finally, we go into possible future improvements to these approaches.
|
54 |
Mining Structural and Functional Patterns in Pathogenic and Benign Genetic Variants through Non-negative Matrix FactorizationPeña-Guerra, Karla A 08 1900 (has links)
The main challenge in studying genetics has evolved from identifying variations and their impact on traits to comprehending the molecular mechanisms through which genetic variations affect human biology, including disease susceptibility. Despite having identified a vast number of variants associated with human traits through large scale genome wide association studies (GWAS) a significant portion of them still lack detailed insights into their underlying mechanisms [1]. Addressing this uncertainty requires the development of precise and scalable approaches to discover how genetic variation precisely influences phenotypes at a molecular level. In this study, we developed a pipeline to automate the annotation of structural variant feature effects. We applied this pipeline to a dataset of 33,942 variants from the ClinVar and GnomAD databases, which included both pathogenic and benign associations. To bridge the gap between genetic variation data and molecular phenotypes, I implemented Non-negative Matrix Factorization (NMF) on this large-scale dataset. This algorithm revealed 6 distinct clusters of variants with similar feature profiles. Among these groups, two exhibited a predominant presence of benign variants (accounting for 70% and 85% of the clusters), while one showed an almost equal distribution of pathogenic and benign variants. The remaining three groups were predominantly composed of pathogenic variants, comprising 68%, 83%, and 77% of the respective clusters. These findings revealed valuable insights into the underlying mechanisms contributing to pathogenicity. Further analysis of this dataset and the exploration of disease-related genes can enhance the accuracy of genetic diagnosis and therapeutic development through the direct inference of variants that are likely to affect the functioning of essential genes.
|
55 |
Evaluating, Understanding, and Mitigating Unfairness in Recommender SystemsYao, Sirui 10 June 2021 (has links)
Recommender systems are information filtering tools that discover potential matchings between users and items and benefit both parties. This benefit can be considered a social resource that should be equitably allocated across users and items, especially in critical domains such as education and employment. Biases and unfairness in recommendations raise both ethical and legal concerns. In this dissertation, we investigate the concept of unfairness in the context of recommender systems. In particular, we study appropriate unfairness evaluation metrics, examine the relation between bias in recommender models and inequality in the underlying population, as well as propose effective unfairness mitigation approaches.
We start with exploring the implication of fairness in recommendation and formulating unfairness evaluation metrics. We focus on the task of rating prediction. We identify the insufficiency of demographic parity for scenarios where the target variable is justifiably dependent on demographic features. Then we propose an alternative set of unfairness metrics that measured based on how much the average predicted ratings deviate from average true ratings. We also reduce these unfairness in matrix factorization (MF) models by explicitly adding them as penalty terms to learning objectives.
Next, we target a form of unfairness in matrix factorization models observed as disparate model performance across user groups. We identify four types of biases in the training data that contribute to higher subpopulation error. Then we propose personalized regularization learning (PRL), which learns personalized regularization parameters that directly address the data biases. PRL poses the hyperparameter search problem as a secondary learning task. It enables back-propagation to learn the personalized regularization parameters by leveraging the closed-form solutions of alternating least squares (ALS) to solve MF. Furthermore, the learned parameters are interpretable and provide insights into how fairness is improved.
Third, we conduct theoretical analysis on the long-term dynamics of inequality in the underlying population, in terms of the fitting between users and items. We view the task of recommendation as solving a set of classification problems through threshold policies. We mathematically formulate the transition dynamics of user-item fit in one step of recommendation. Then we prove that a system with the formulated dynamics always has at least one equilibrium, and we provide sufficient conditions for the equilibrium to be unique. We also show that, depending on the item category relationships and the recommendation policies, recommendations in one item category can reshape the user-item fit in another item category.
To summarize, in this research, we examine different fairness criteria in rating prediction and recommendation, study the dynamic of interactions between recommender systems and users, and propose mitigation methods to promote fairness and equality. / Doctor of Philosophy / Recommender systems are information filtering tools that discover potential matching between users and items. However, a recommender system, if not properly built, may not treat users and items equitably, which raises ethical and legal concerns. In this research, we explore the implication of fairness in the context of recommender systems, study the relation between unfairness in recommender output and inequality in the underlying population, and propose effective unfairness mitigation approaches.
We start with finding unfairness metrics appropriate for recommender systems. We focus on the task of rating prediction, which is a crucial step in recommender systems. We propose a set of unfairness metrics measured as the disparity in how much predictions deviate from the ground truth ratings. We also offer a mitigation method to reduce these forms of unfairness in matrix factorization models
Next, we look deeper into the factors that contribute to error-based unfairness in matrix factorization models and identify four types of biases that contribute to higher subpopulation error. Then we propose personalized regularization learning (PRL), which is a mitigation strategy that learns personalized regularization parameters to directly addresses data biases. The learned per-user regularization parameters are interpretable and provide insight into how fairness is improved.
Third, we conduct a theoretical study on the long-term dynamics of the inequality in the fitting (e.g., interest, qualification, etc.) between users and items. We first mathematically formulate the transition dynamics of user-item fit in one step of recommendation. Then we discuss the existence and uniqueness of system equilibrium as the one-step dynamics repeat. We also show that depending on the relation between item categories and the recommendation policies (unconstrained or fair), recommendations in one item category can reshape the user-item fit in another item category.
In summary, we examine different fairness criteria in rating prediction and recommendation, study the dynamics of interactions between recommender systems and users, and propose mitigation methods to promote fairness and equality.
|
56 |
Evaluation of PM2.5 Components and Source Apportionment at a Rural Site in the Ohio River Valley RegionDeshpande, Seemantini R. 27 September 2007 (has links)
No description available.
|
57 |
Latent Factor Models for Recommender Systems and Market Segmentation Through ClusteringZeng, Jingying 29 August 2017 (has links)
No description available.
|
58 |
Air Quality in Mexico City: Spatial and Temporal Variations of Particulate Polycyclic Aromatic Hydrocarbons and Source Apportionment of Gasoline-Versus-Diesel Vehicle EmissionsThornhill, Dwight Anthony Corey 21 August 2007 (has links)
The Mexico City Metropolitan Area (MCMA) is one of the largest cities in the world, and as with many megacities worldwide, it experiences serious air quality and pollution problems, especially with ozone and particulate matter. Ozone levels exceed the health-based standard, which is equivalent to the U.S. standard, on approximately 80% of all days, and concentrations of particulate matter 10 μm and smaller (PM10) exceed the standard on more than 40% of all days in most years. Particulate polycyclic aromatic hydrocarbons (PAHs) are a class of semi-volatile compounds that are formed during combustion and many of these compounds are known or suspected carcinogens. Recent studies on PAHs in Mexico City indicate that very high concentrations have been observed there and may pose a serious health hazard.
The first part of this thesis describes results from the Megacities Initiative: Local and Regional Observations (MILAGRO) study in Mexico City in March 2006. During this field campaign, we measured PAH and aerosol active surface area (AS) concentrations at six different locations throughout the city using the Aerodyne Mobile Laboratory (AML). The different sites encompassed a mix of residential, commercial, industrial, and undeveloped land use. The goals of this research were to describe spatial and temporal patterns in PAH and AS concentrations, to gain insight into sources of PAHs, and to quantify the relationships between PAHs and other pollutants. We observed that the highest measurements were generally found at sites with dense traffic networks. Also, PAH concentrations varied considerably in space. An important implication of this result is that for risk assessment studies, a single monitoring site will not adequately represent an individual's exposure.
Source identification and apportionment are essential for developing effective control strategies to improve air quality and therefore reduce the health impacts associated with fine particulate matter and PAHs. However, very few studies have separated gasoline- versus diesel-powered vehicle emissions under a variety of on-road driving conditions. The second part of this thesis focuses on distinguishing between the two types of engine emissions within the MCMA using positive matrix factorization (PMF) receptor modeling. The Aerodyne Mobile Laboratory drove throughout the MCMA in March 2006 and measured on-road concentrations of a large suite of gaseous and particulate pollutants, including carbon dioxide, carbon monoxide (CO), nitric oxide (NO), benzene (C6H6), formaldehyde (HCHO), ammonia (NH3), fine particulate matter (PM2.5), PAHs, and black carbon (BC). These pollutant species served as the input data for the receptor model. Fuel-based emission factors and annual emissions within Mexico City were then calculated from the source profiles of the PMF model and fuel sales data. We found that gasoline-powered vehicles were responsible for 90% of mobile source CO emissions and 85% of VOCs, while diesel-powered vehicles accounted for almost all of NO emissions (99.98%). Furthermore, the annual emissions estimates for CO and VOC were lower than estimated during the MCMA-2003 field campaign.
The number of megacities is expected to grow dramatically in the coming decades. As one of the world's largest megacities, Mexico City serves as a model for studying air quality problems in highly populated, extremely polluted environments. The results of this work can be used by policy makers to improve air quality and reduce related health risks in Mexico City and other megacities. / Master of Science
|
59 |
A Statistical Methods-Based Novel Approach for Fully Automated Analysis of Chromatographic DataKim, Sungwoo 04 December 2024 (has links)
Atmospheric samples are complex mixtures that contain thousands of volatile organic compounds (VOCs) with diverse physicochemical properties and multiple isomers. These compounds can interact with nitrogen oxides, leading to the formation of ozone and particulate matter, which have detrimental effects on human health. Therefore, it is essential to apply effective analytical methods to obtain valuable information about the sources and transformation processes of these samples. Gas chromatography coupled with mass spectrometry (GC-MS) is a widely used method for the analysis of these complex mixtures due to its sensitivity and resolution. However, it presents significant challenges in data reduction and analyte identification due to the complexity and variability of atmospheric data. Traditional processing methods of large GC-MS datasets are highly time-consuming and may lead to the loss of potentially valuable information from relatively weak signals and incomplete characterization of compounds. This study addresses these challenges. An automated approach is developed that catalogs and identifies nearly all analytes in large chromatographic datasets by combining factor analysis and a decision tree approach to de-convolute peaks. This approach was applied to data from the GoAmazon 2014/5 campaign and cataloged more than 1000 unique analytes. A novel method is then introduced to automatically identify quantification ions for single-ion chromatogram (SIC) based peak fitting and integration to generate time series of analytes. Through these combined approaches, a complex GC-MS dataset of atmospheric composition is reduced and processed fully automatically. Additionally, a machine learning-based dimensionality reduction algorithm was applied to the generated time series data for systematic characterization and categorization of both identified and unidentified compounds, clustering them into 8 distinct groups based on their temporal variation. These data are then used to generate fundamental insight into the atmospheric processes impact composition. This analysis aimed to elucidate the effects of meteorological conditions on these compounds, particularly the impact of wet deposition through precipitation scavenging on gas- and particle-phase oxygenated compounds. Hourly removal rates for all analytes were estimated by examining the impacts of precipitation on their concentration. / Doctor of Philosophy / Atmospheric samples are made up of thousands of different volatile organic compounds (VOCs) with varying chemical properties and multiple forms, making them highly complex. These compounds can interact with nitrogen oxides, leading to the formation of ozone and particulate matter, which can have serious health effects. To better understand the sources and transformations of these compounds, it is crucial to use effective analytical methods. Gas chromatography coupled with mass spectrometry (GC-MS) is a powerful tool commonly used to analyze these complex mixtures due to its high sensitivity and ability to separate different compounds. However, the complex nature of atmospheric data poses challenges in analyzing and identifying the vast number of compounds present. Traditional methods for processing large GC-MS datasets are often time-consuming and may overlook potentially important but weak signals, resulting in incomplete identification of compounds.
This study addresses these challenges by developing an automated method that efficiently catalogs and identifies almost all compounds in large GC-MS datasets. By combining factor analysis with a decision tree approach, the new method can separate overlapping signals and identify distinct compounds. This approach was applied to data from the GoAmazon 2014/5 campaign, successfully cataloging over 1,000 unique analytes. Additionally, a novel technique was introduced to automatically identify the best ions for quantifying each analyte and generate concentration time series data. The processed data were further analyzed using a machine learning algorithm to group both known and unknown analytes into 8 distinct categories based on their behavior over time. This analysis provided key insights into how atmospheric processes, especially weather conditions such as rainfall, affect the composition of these analytes. The study estimated the rate at which different analytes were removed from the atmosphere by precipitation, shedding light on the impact of wet deposition on gas- and particle-phase compounds.
|
60 |
Accuracy and Interpretability Testing of Text Mining MethodsAshton, Triss A. 08 1900 (has links)
Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new method are then compared to another method that was similarly developed. The proposed research introduces an experimentally designed testing method to text mining that eliminates research silo bias and simultaneously evaluates methods from all of the major context-region text mining method families. The proposed research method follows a random block factorial design with two treatments consisting of three and five levels (RBF-35) with repeated measures. Contribution of the research is threefold. First, the users perceived a difference in the effectiveness of the various methods. Second, while still not clear, there are characteristics with in the text collection that affect the algorithms ability to extract meaningful results. Third, this research develops an experimental design process for testing the algorithms that is adaptable into other areas of software development and algorithm testing. This design eliminates the bias based practices historically employed by algorithm developers.
|
Page generated in 0.091 seconds