• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 90
  • 20
  • 14
  • 13
  • 5
  • 4
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 170
  • 170
  • 56
  • 41
  • 38
  • 37
  • 31
  • 28
  • 27
  • 26
  • 26
  • 26
  • 23
  • 22
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

O impacto das fontes de poluição na distribuição de tamanho em número e massa do material particulado atmosférico em São Paulo / The Impact of Pollution Sources on Number and Mass Size Distribution of Atmospheric Particulate Matter in São Paulo

Luís Henrique Mendes dos Santos 06 August 2018 (has links)
Diversos estudos tiveram como objetivo determinar e caracterizar o aerossol atmosférico na cidade de São Paulo, quanto a seu tamanho e composição química, bem como encontrar as suas fontes emissoras e contribuições em massa para a região estudada. A coleta dos constituintes atmosféricos foi realizada na estação de amostragem do Laboratório de Análises dos Processos Atmosféricos (LAPAt) do Instituto de Astronomia, Geofísica e Ciências Atmosféricas (IAG) da Universidade de São Paulo (USP), localizada na zona oeste da cidade de São Paulo, geograficamente em 23°3334 S e 46°4400 O. O experimento foi realizado de 15 de agosto a 16 de setembro de 2016. Foram realizadas coletas de material particulado para análise da concentração em massa de sua fração fina inalável e composição química. A distribuição de tamanho para massa de material particulado foi determinada através da coleta com um impactador em cascata. A distribuição de tamanho para número foi obtida a partir de medidas com um Scanning Mobility Particle Sampler (SMPS) com o cálculo da concentração número de partículas (PNC) para o intervalo de 9 a 450 nm de diâmetro. Para estudar as relações entre os gases presentes na região amostrada com a radiação ultravioleta e com o PNC utilizamos os valores horários de concentrações dos gases (O3, NO, NO2 e NOX) e UV medidos na Rede Telemétrica da CETESB (Companhia de Tecnologia Ambiental do Estado de São Paulo). Os filtros coletados foram analisados pela técnica de Fluorescência de Raios-X dispersivo em energia (EDX). As concentrações de Black Carbon (BC) foram obtidas por refletância. Para a determinação das fontes de material particulado fino (MP2,5) foram utilizados os seguintes modelos receptores: Análise de Componentes Principais (ACP) e Fatoração de Matriz Positiva (FMP). Para análise de dispersão do poluente, utilizamos dados meteorológicos da estação climatológica do IAG situada no Parque do Estado. A concentração média de MP2,5 foi de 18,6 (±12,5) g/m³ e a concentração média de BC foi de 1,9 (±1,5) g/m³. As principais fontes encontradas, por ambos modelos receptores ACP e FMP, foram: veículos pesados (a diesel), veículos leves, queima de biomassa, ressuspensão de poeira de solo, pavimentos e construção, processos secundários e misturas de fontes. Os elementos-traço foram definidos em diferentes modas de tamanho: Al, Ca, Si e Ti com picos nas modas de acumulação, traçadores de ressuspensão de pavimento; Fe, Mn, P, K e Cr com picos na fração mais grossa da moda de acumulação, traçadores de emissões veiculares e queima de biomassa. Cu, Zn, Br, Pb, S e BC apresentam picos na fração mais fina da moda de acumulação, traçadores de emissões veiculares e queima de biomassa. / Several studies aimed to determine and characterize the atmospheric aerosol in the city of São Paulo, not only to its size and chemical composition, but as well as to find its emitting sources and mass contributions in the studied area. The atmospheric constituents were collected at the Laboratório de Análise dos Processos Atmosféricos (LAPAt) of the Institute of Astronomy, Geophysics and Atmospheric Sciences (IAG) of the University of São Paulo (USP), located in the western zone of the city of São Paulo Paulo, geographically at 23°33\'34\"S and 46°44\'00\" W. The experiment was conducted from August 15 to September 16 of 2016. Samples of particulate matter were collected to analyze the mass concentration and chemical composition of its inhalable fine fraction. The particulate mass size distribution was determined through the collection with a cascade impactor. The number size distribution was obtained from measurements with a Scanning Mobility Particle Sampler (SMPS) with the calculated number of particle concentration (PNC) for the range of 9 to 450 nm of the diameter. In order to study the relationships among the compounds present in the region and the PNC, we used the hourly values of the gaseous concentrations (O3, NO, NO2 and NOx) and UV measured in CETESB\'s Air Quality Telemetric Network in the State of São Paulo. The sampled filters were analyzed by the energy dispersive X-ray Fluorescence (EDX) technique to determine the elemental composition. The concentrations of Black Carbon (BC) were obtained by reflectance analysis. In order to determine the sources of fine particulate matter (PM2.5), the following Receptors Models were used: Principal Component Analysis (PCA) and Positive Matrix Factorization (PMF). For air pollution dispersion analysis, we used meteorological data from the IAG climatological station located in the Southeast of the city. The mean MP2.5 concentration was 18.6 (± 12.5) g/m³ and the mean concentration of BC was 1.9 (± 1.5) g/m³ for the sampling period. The main sources found by both ACP and PMF models were heavy-duty vehicles (diesel), light-duty vehicles, biomass burning, resuspension of soil dust, pavements and construction, secondary processes and mixed sources. The trace elements were defined at different size distributions: Al, Ca, Si and Ti with peaks in accumulation fraction (related to pavement resuspension tracers); Fe, Mn, P, K and Cr with peaks in the largest fraction of accumulation mode, characteristic of vehicular emissions tracer and biomass burning. Cu, Zn, Br, Pb, S and BC presented peaks in the finer fraction of the accumulation mode, related to vehicle emissions tracer and biomass burning.
62

Improving the discrimination of primary and secondary sources of organic aerosol : use of molecular markers and different approaches / Amélioration de la discrimination des sources primaires et secondaires de l'aérosol organique : utilisation de marqueurs moléculaires et de différentes approches

Srivastava, Deepchandra 26 April 2018 (has links)
Les aérosols organiques (AO), issus de nombreuses sources et de différents processus atmosphériques, ont un impact significatif sur la qualité de l’air et le changement climatique. L’objectif de ce travail de thèse était d’acquérir une meilleure connaissance de l’origine des AO par l’utilisation de marqueurs organiques moléculaires au sein de modèles source-récepteur de type positive matrix factorization (PMF). Ce travail expérimental était basé sur deux campagnes de prélèvements réalisées à Grenoble (site urbain) au cours de l’année 2013 et dans la région parisienne (site péri-urbain du SIRTA, 25 km au sud-ouest de Paris) lors d’un intense épisode de pollution aux particules (PM) en Mars 2015. Une caractérisation chimique étendue (de 139 à 216 espèces quantifiées) a été réalisée et l’utilisation de marqueurs moléculaires primaires et secondaires clés dans la PMF a permis de déconvoluer de 9 à 11 sources différentes de PM10 (Grenoble et SIRTA, de façon respective) incluant aussi bien des sources classiques (combustion de biomasse, trafic, poussières, sels de mer, nitrate et espèces inorganiques secondaires) que des sources non communément résolues telles que AO biogéniques primaires (spores fongiques et débris de plantes), AO secondaires (AOS) biogéniques (marin, oxydation de l’isoprène) et AOS anthropiques (oxydation des hydrocarbures aromatiques polycycliques (HAP) et/ou des composés phénoliques). En outre, le jeu de données obtenu pour la région parisienne à partir de prélèvements sur des pas de temps courts (4h) a permis d’obtenir une meilleure compréhension des profils diurnes et des processus chimiques impliquées. Ces résultats ont été comparés à ceux issus d’autres techniques de mesures (en temps réel, ACSM (aerosol chemical speciation monitor) et analyse AMS (aerosol mass spectrometer) en différée) et/ou d’autres méthodes de traitement de données (méthodes traceur EC (elemental carbon) et traceur AOS). Un bon accord a été obtenu entre toutes les méthodes en termes de séparation des fractions primaires et secondaires. Cependant, et quelle que soit l’approche utilisée, la moitié de la masse d’AOS n’était toujours pas complètement décrite. Ainsi, une nouvelle approche d’étude des sources de l’AO a été développée en combinant les mesures en temps réel (ACSM) et celles sur filtres (marqueurs moléculaires organiques) et en utilisant un script de synchronisation des données. L’analyse PMF combinée a été réalisée sur la matrice de données unifiée. 10 facteurs AO, incluant 4 profils chimiques différents en lien avec la combustion de biomasse, ont été mis en évidence. Par rapport aux approches conventionnelles, cette nouvelle méthodologie a permis d’obtenir une meilleure compréhension des processus atmosphériques liés aux différentes sources d’AO. / Organic aerosols (OAs), originating from a wide variety of sources and atmospheric processes, have strong impacts on air quality and climate change. The present PhD thesis aimed to get a better understanding of OA origins using specific organic molecular markers together with their input into source-receptor model such as positive matrix factorization (PMF). This experimental work was based on two field campaigns, conducted in Grenoble (urban site) over the 2013 year and in the Paris region (suburban site of SIRTA, 25 km southwest of Paris) during an intense PM pollution event in March 2015. Following an extended chemical characterization (from 139 to 216 species quantified), the use of key primary and secondary organic molecular markers within the standard filter-based PMF model allowed to deconvolve 9 and 11 PM10 sources (Grenoble and SIRTA, respectively). These included common ones (biomass burning, traffic, dust, sea salt, secondary inorganics and nitrate), as well as uncommon resolved sources such as primary biogenic OA (fungal spores and plant debris), biogenic secondary AO (SOA) (marine, isoprene oxidation) and anthropogenic SOA (polycyclic aromatic hydrocarbons (PAHs) and/or phenolic compounds oxidation). In addition, high time-resolution filter dataset (4h-timebase) available for the Paris region also illustrated a better understanding of the diurnal profiles and the involved chemical processes. These results could be compared to outputs from other measurement techniques (online ACSM (aerosol chemical speciation monitor), offline AMS (aerosol mass spectrometer) analyses), and/or to other data treatment methodologies (EC (elemental carbon) tracer method and SOA tracer method). A good agreement was obtained between all the methods in terms of separation between primary and secondary OA fractions. Nevertheless, and whatever the method used, still about half of the SOA mass was not fully described. Therefore, a novel OA source apportionment approach has finally been developed by combining online (ACSM) and offline (organic molecular markers) measurements and using a time synchronization script. This combined PMF analysis was performed on the unified matrix. It revealed 10 OA factors, including 4 different biomass burning-related chemical profiles. Compared to conventional approaches, this new methodology provided a more comprehensive description of the atmospheric processes related to the different OA sources.
63

Cluster Identification : Topic Models, Matrix Factorization And Concept Association Networks

Arun, R 07 1900 (has links) (PDF)
The problem of identifying clusters arising in the context of topic models and related approaches is important in the area of machine learning. The problem concerning traversals on Concept Association Networks is of great interest in the area of cognitive modelling. Cluster identification is the problem of finding the right number of clusters in a given set of points(or a dataset) in different settings including topic models and matrix factorization algorithms. Traversals in Concept Association Networks provide useful insights into cognitive modelling and performance. First, We consider the problem of authorship attribution of stylometry and the problem of cluster identification for topic models. For the problem of authorship attribution we show empirically that by using stop-words as stylistic features of an author, vectors obtained from the Latent Dirichlet Allocation (LDA) , outperforms other classifiers. Topics obtained by this method are generally abstract and it may not be possible to identify the cohesiveness of words falling in the same topic by mere manual inspection. Hence it is difficult to determine if the chosen number of topics is optimal. We next address this issue. We propose a new measure for topics arising out of LDA based on the divergence between the singular value distribution and the L1 norm distribution of the document-topic and topic-word matrices, respectively. It is shown that under certain assumptions, this measure can be used to find the right number of topics. Next we consider the Non-negative Matrix Factorization(NMF) approach for clustering documents. We propose entropy based regularization for a variant of the NMF with row-stochastic constraints on the component matrices. It is shown that when topic-splitting occurs, (i.e when an extra topic is required) an existing topic vector splits into two and the divergence term in the cost function decreases whereas the entropy term increases leading to a regularization. Next we consider the problem of clustering in Concept Association Networks(CAN). The CAN are generic graph models of relationships between abstract concepts. We propose a simple clustering algorithm which takes into account the complex network properties of CAN. The performance of the algorithm is compared with that of the graph-cut based spectral clustering algorithm. In addition, we study the properties of traversals by human participants on CAN. We obtain experimental results contrasting these traversals with those obtained from (i) random walk simulations and (ii) shortest path algorithms.
64

Chauffage au bois et qualité de l’air en Vallée de l’Arve : définition d’un système de surveillance et impact d’une politique de rénovation du parc des appareils anciens / Wood heating and air quality in the Arve Valley : definition of a surveillance system and impact of a renovation policy of old devices

Chevrier, Florie 23 November 2016 (has links)
La combustion de la biomasse est l’une des sources majoritaires de particules atmosphériques en périodes hivernales dans les vallées alpines, et particulièrement en vallée de l’Arve où des dépassements des seuils européens sont très régulièrement observés. Ceci a conduit à la mise en place d’un large programme de remplacement des dispositifs de chauffage au bois les moins performants dans le cadre d’une des actions du Plan de Protection de l’Atmosphère, le Fond Air Bois. Le projet DECOMBIO (DÉconvolution de la contribution de la COMbustion de la BIOmasse aux PM10 dans la vallée de l’Arve) a ainsi été mis en place en octobre 2013 afin de mesurer l’impact de cette politique de rénovation des appareils de chauffage au bois sur la qualité de l’air. C’est dans ce programme que s’inscrivent ces travaux de thèse dont l’objectif principal est de valider les méthodologies mises en place en routine pour permettre une déconvolution rapide de la combustion de la biomasse et mettre en relation les éventuels changements observés avec les avancées des remplacements de dispositifs de chauffage au bois domestiques.Pour mener à bien ce travail, trois sites, représentant les différentes situations de la vallée de l’Arve, ont été instrumentés (Marnaz, Passy et Chamonix) afin de suivre en continu, et tout au long du projet DECOMBIO, l’évolution des concentrations atmosphériques du Black Carbon (BC) et des traceurs moléculaires permettant de distinguer la contribution de la combustion de la biomasse des autres types de combustion. Un important jeu de données a été acquis entre novembre 2013 et octobre 2014 grâce à des prélèvements réguliers sur filtre permettant une caractérisation très fine de la composition chimique des particules atmosphériques. L’utilisation de l’approche statistique « Positive Matrix Factorization » (PMF) a permis de mieux appréhender les différentes sources entrant en jeu dans les émissions de particules au sein de cette vallée avec notamment un intérêt particulier pour les émissions de la combustion de la biomasse. Le développement de cette méthodologie d’attribution et de quantification des sources de particules basé sur l’utilisation de traceurs organiques spécifiques, de contraintes particulières appliquées à ce modèle et de données de déconvolution de la matière carbonée constitue une avancée importante dans la définition des facteurs sources issus de ce modèle.Les méthodologies développées au cours de ce travail, permettant une amélioration des connaissances et des contributions des sources, constituent donc des outils directement utilisables par les Associations Agréées de Surveillance de la Qualité de l’Air (AASQA), notamment pour l’évaluation quantitative des mesures prises pour améliorer la qualité de l’air dans le cadre de Plans de Protection de l’Atmosphère, entre autres celui de la vallée de l’Arve. / Biomass burning is one of the major sources of atmospheric particles during wintertime in Alpine valleys, and more especially in the Arve valley where exceedances of the European regulated limit value are regularly observed. This situation led to the establishment of an important program of replacement of old wood stoves with new ones as part of an action of an Atmospheric Protection Plan (APP), the “Fonds Air Bois”. The research program DECOMBIO (“DÉconvolution de la contribution de la COMbustion de la BIOmasse aux PM10 dans la vallée de l’Arve”) has been set up in October 2013 to estimate the impact of this wood stoves renewal policy on air quality. This thesis works be incorporated within this program and have for main objective to validate methodologies used in routine to enable a fast deconvolution of the biomass burning source and to compare any observed changes with progress of wood stove changeout.To complete this work, three sites, representing the different situations of the Arve valley, were instrumented (Marnaz, Passy and Chamonix) to monitor the continuing evolution of atmospheric concentrations of Black Carbon (BC) and molecular markers enabling to distinguish between the biomass burning contribution and that of other types of combustion. A large dataset was acquired between November 2013 and October 2014 thanks to regular filter samples enabling a vast chemical characterization of PM10. The use of statistical analysis “Positive Matrix Factorization” (PMF) has led to an enhanced appreciation of particle emission sources within this valley with a focus on biomass burning emissions. The development of this methodology of identification and source apportionment based on the use of specific organic markers, specific constraints and data from carbonaceous matter deconvolution is an important progress in definition of factors from this model.The developed methodologies during this work, enabling an improvement of knowledges and source apportionment, are tools directly usable by French Accredited Associations for Air Quality Monitoring, especially for the quantitative assessment of actions introduced to improve air quality as part of Atmospheric Protection Plans, for example the one in the Arve valley.
65

Minimum Cost Distributed Computing using Sparse Matrix Factorization / Minsta-kostnads Distribuerade Beräkningar genom Gles Matrisfaktorisering

Hussein, Seif January 2023 (has links)
Distributed computing is an approach where computationally heavy problems are broken down into more manageable sub-tasks, which can then be distributed across a number of different computers or servers, allowing for increased efficiency through parallelization. This thesis explores an established distributed computing setting, in which the computationally heavy task involves a number of users requesting a linearly separable function to be computed across several servers. This setting results in a condition for feasible computation and communication that can be described by a matrix factorization problem. Moreover, the associated costs with computation and communication are directly related to the number of nonzero elements of the matrix factors, making sparse factors desirable for minimal costs. The Alternating Direction Method of Multipliers (ADMM) is explored as a possible method of solving the sparse matrix factorization problem. To obtain convergence results, extensive convex analysis is conducted on the ADMM iterates, resulting in a theorem that characterizes the limiting points of the iterates as KKT points for the sparse matrix factorization problem. Using the results of the analysis, an algorithm is devised from the ADMM iterates, which can be applied to the sparse matrix factorization problem. Furthermore, an additional implementation is considered for a noisy scenario, in which existing theoretical results are used to justify convergence. Finally, numerical implementations of the devised algorithms are used to perform sparse matrix factorization. / Distribuerad beräkning är en metod där beräkningstunga problem bryts ner i hanterbara deluppgifter, som sedan kan distribueras över ett antal olika beräkningsenheter eller servrar, vilket möjliggör ökad effektivitet genom parallelisering. Denna avhandling undersöker en etablerad distribuerad beräkningssmiljö, där den beräkningstunga uppgiften involverar ett antal användare som begär en linjärt separabel funktion som beräknas över flera servrar. Denna miljö resulterar i ett villkor för tillåten beräkning och kommunikation som kan beskrivas genom ett matrisfaktoriseringsproblem. Dessutom är det möjligt att relatera kostanderna associerade med beräkning och kommunikation till antalet nollskilda element i matrisfaktorerna, vilket gör glesa matrisfaktorer önskvärda. Alternating Direction Method of Multipliers (ADMM) undersöks som en möjlig metod för att lösa det glesa matrisfaktoriseringsproblemet. För att erhålla konvergensresultat genomförs omfattande konvex analys på ADMM-iterationerna, vilket resulterar i ett teorem som karakteriserar de begränsande punkterna för iterationerna som KKT-punkter för det glesa matrisfaktoriseringsproblemet. Med hjälp av resultaten från analysen utformas en algoritm från ADMM-iterationerna, vilken kan appliceras på det glesa matrisfaktoriseringsproblemet. Dessutom övervägs en ytterligare implementering för ett brusigt scenario, där befintliga teoretiska resultat används för att motivera konvergens. Slutligen används numeriska implementeringar av de framtagna algoritmerna för att utföra gles matrisfaktorisering.
66

Discovering Hidden Networks Using Topic Modeling

Cooper, Wyatt 01 January 2017 (has links)
This paper explores topic modeling via unsupervised non-negative matrix factorization. This technique is used on a variety of sources in order to extract salient topics. From these topics, hidden entity networks are discovered and visualized in a graph representation. In addition, other visualization techniques such as examining the time series of a topic and examining the top words of a topic are used for evaluation and analysis. There is a large software component to this project, and so this paper will also focus on the design decisions that were made in order to make the program developed as versatile and extensible as possible.
67

Fatoração de matrizes no problema de coagrupamento com sobreposição de colunas / Matrix factorization for overlapping columns coclustering

Brunialti, Lucas Fernandes 31 August 2016 (has links)
Coagrupamento é uma estratégia para análise de dados capaz de encontrar grupos de dados, então denominados cogrupos, que são formados considerando subconjuntos diferentes das características descritivas dos dados. Contextos de aplicação caracterizados por apresentar subjetividade, como mineração de texto, são candidatos a serem submetidos à estratégia de coagrupamento; a flexibilidade em associar textos de acordo com características parciais representa um tratamento adequado a tal subjetividade. Um método para implementação de coagrupamento capaz de lidar com esse tipo de dados é a fatoração de matrizes. Nesta dissertação de mestrado são propostas duas estratégias para coagrupamento baseadas em fatoração de matrizes não-negativas, capazes de encontrar cogrupos organizados com sobreposição de colunas em uma matriz de valores reais positivos. As estratégias são apresentadas em termos de suas definições formais e seus algoritmos para implementação. Resultados experimentais quantitativos e qualitativos são fornecidos a partir de problemas baseados em conjuntos de dados sintéticos e em conjuntos de dados reais, sendo esses últimos contextualizados na área de mineração de texto. Os resultados são analisados em termos de quantização do espaço e capacidade de reconstrução, capacidade de agrupamento utilizando as métricas índice de Rand e informação mútua normalizada e geração de informação (interpretabilidade dos modelos). Os resultados confirmam a hipótese de que as estratégias propostas são capazes de descobrir cogrupos com sobreposição de forma natural, e que tal organização de cogrupos fornece informação detalhada, e portanto de valor diferenciado, para as áreas de análise de agrupamento e mineração de texto / Coclustering is a data analysis strategy which is able to discover data clusters, known as coclusters. This technique allows data to be clustered based on different subsets defined by data descriptive features. Application contexts characterized by subjectivity, such as text mining, are candidates for applying coclustering strategy due to the flexibility to associate documents according to partial features. The coclustering method can be implemented by means of matrix factorization, which is suitable to handle this type of data. In this thesis two strategies are proposed in non-negative matrix factorization for coclustering. These strategies are able to find column overlapping coclusters in a given dataset of positive data and are presented in terms of their formal definitions as well as their algorithms\' implementation. Quantitative and qualitative experimental results are presented through applying synthetic datasets and real datasets contextualized in text mining. This is accomplished by analyzing them in terms of space quantization, clustering capabilities and generated information (interpretability of models). The well known external metrics Rand index and normalized mutual information are used to achieve the analysis of clustering capabilities. Results confirm the hypothesis that the proposed strategies are able to discover overlapping coclusters naturally. Moreover, these coclusters produced by the new algorithms provide detailed information and are thus valuable for future research in cluster analysis and text mining
68

Simplificação e análise de redes com dados multivariados / Simplification and analysis of network with multivariate data

Dias, Markus Diego Sampaio da Silva 17 October 2018 (has links)
As técnicas de visualização desempenham um papel importante na assistência e compreensão de redes e seus elementos. No entanto, quando enfrentamos redes massivas, a análise tende a ser prejudicada pela confusão visual. Esquemas de simplificação e agrupamento têm sido algumas das principais alternativas neste contexto. No entanto, a maioria das técnicas de simplificação consideram apenas informações extraídas da topologia da rede, desconsiderando conteúdo adicional definido nos nós ou arestas da rede. Neste trabalho, propomos dois estudos. Primeiro uma nova metodologia para simplificação de redes que utiliza tanto a topologia quanto o conteúdo associado aos elementos de rede. A metodologia proposta baseia-se na fatoração de matriz não negativa (NMF) e emparelhamento para realizar a simplificação, combinadas para gerar uma representação hierárquica da rede, agrupando elementos semelhantes em cada nível da hierarquia. Propomos também um estudo da utilização da teoria de processamento de sinal em grafos para filtrar os dados associados aos elementos da rede e o seu efeito no processo de simplificação. / Visualization tools play an important role in assisting and understanding networks and their elements. However, when faced with larger networks, analytical tasks can be hindered by visual clutter. Schemes of simplification and clustering have been a main alternative in this context. Nevertheless, most simplification techniques consider only information extracted from the network topology, disregarding additional content defined in nodes or edges. In this paper, we propose two studies. First, a new methodology for network simplification that uses both topology and content associated with network elements. The proposed methodology is based on non-negative matrix factorization (NMF) and graph matching to perform the simplification, combined to generate a hierarchical representation of the network, grouping the most similar elements at each level of a hierarchy. We also provide a study of the use of the graph signal processing theory to filter data associated to the elements of a network and its effect in the process of simplification.
69

Fatoração de matrizes no problema de coagrupamento com sobreposição de colunas / Matrix factorization for overlapping columns coclustering

Lucas Fernandes Brunialti 31 August 2016 (has links)
Coagrupamento é uma estratégia para análise de dados capaz de encontrar grupos de dados, então denominados cogrupos, que são formados considerando subconjuntos diferentes das características descritivas dos dados. Contextos de aplicação caracterizados por apresentar subjetividade, como mineração de texto, são candidatos a serem submetidos à estratégia de coagrupamento; a flexibilidade em associar textos de acordo com características parciais representa um tratamento adequado a tal subjetividade. Um método para implementação de coagrupamento capaz de lidar com esse tipo de dados é a fatoração de matrizes. Nesta dissertação de mestrado são propostas duas estratégias para coagrupamento baseadas em fatoração de matrizes não-negativas, capazes de encontrar cogrupos organizados com sobreposição de colunas em uma matriz de valores reais positivos. As estratégias são apresentadas em termos de suas definições formais e seus algoritmos para implementação. Resultados experimentais quantitativos e qualitativos são fornecidos a partir de problemas baseados em conjuntos de dados sintéticos e em conjuntos de dados reais, sendo esses últimos contextualizados na área de mineração de texto. Os resultados são analisados em termos de quantização do espaço e capacidade de reconstrução, capacidade de agrupamento utilizando as métricas índice de Rand e informação mútua normalizada e geração de informação (interpretabilidade dos modelos). Os resultados confirmam a hipótese de que as estratégias propostas são capazes de descobrir cogrupos com sobreposição de forma natural, e que tal organização de cogrupos fornece informação detalhada, e portanto de valor diferenciado, para as áreas de análise de agrupamento e mineração de texto / Coclustering is a data analysis strategy which is able to discover data clusters, known as coclusters. This technique allows data to be clustered based on different subsets defined by data descriptive features. Application contexts characterized by subjectivity, such as text mining, are candidates for applying coclustering strategy due to the flexibility to associate documents according to partial features. The coclustering method can be implemented by means of matrix factorization, which is suitable to handle this type of data. In this thesis two strategies are proposed in non-negative matrix factorization for coclustering. These strategies are able to find column overlapping coclusters in a given dataset of positive data and are presented in terms of their formal definitions as well as their algorithms\' implementation. Quantitative and qualitative experimental results are presented through applying synthetic datasets and real datasets contextualized in text mining. This is accomplished by analyzing them in terms of space quantization, clustering capabilities and generated information (interpretability of models). The well known external metrics Rand index and normalized mutual information are used to achieve the analysis of clustering capabilities. Results confirm the hypothesis that the proposed strategies are able to discover overlapping coclusters naturally. Moreover, these coclusters produced by the new algorithms provide detailed information and are thus valuable for future research in cluster analysis and text mining
70

Data Poisoning Attacks on Linked Data with Graph Regularization

January 2019 (has links)
abstract: Social media has become the norm of everyone for communication. The usage of social media has increased exponentially in the last decade. The myriads of Social media services such as Facebook, Twitter, Snapchat, and Instagram etc allow people to connect with their friends, and followers freely. The attackers who try to take advantage of this situation has also increased at an exponential rate. Every social media service has its own recommender systems and user profiling algorithms. These algorithms use users current information to make different recommendations. Often the data that is formed from social media services is Linked data as each item/user is usually linked with other users/items. Recommender systems due to their ubiquitous and prominent nature are prone to several forms of attacks. One of the major form of attacks is poisoning the training set data. As recommender systems use current user/item information as the training set to make recommendations, the attacker tries to modify the training set in such a way that the recommender system would benefit the attacker or give incorrect recommendations and hence failing in its basic functionality. Most existing training set attack algorithms work with ``flat" attribute-value data which is typically assumed to be independent and identically distributed (i.i.d.). However, the i.i.d. assumption does not hold for social media data since it is inherently linked as described above. Usage of user-similarity with Graph Regularizer in morphing the training data produces best results to attacker. This thesis proves the same by demonstrating with experiments on Collaborative Filtering with multiple datasets. / Dissertation/Thesis / Masters Thesis Computer Science 2019

Page generated in 0.149 seconds