Spelling suggestions: "subject:"graph construction"" "subject:"raph construction""
1 |
Finite and infinite extensions of regular graphsGasquoine, Sarah Louise January 1999 (has links)
No description available.
|
2 |
Task driven representation learning / Apprentissage de représentation dirigée par la tâcheWauquier, Pauline 29 May 2017 (has links)
De nombreux algorithmes d'Apprentissage automatique ont été proposés afin de résoudre les différentes tâches pouvant être extraites des problèmes de prédiction issus d'un contexte réel. Pour résoudre les différentes tâches pouvant être extraites, la plupart des algorithmes d'Apprentissage automatique se basent d'une manière ou d'une autre sur des relations liant les instances. Les relations entre paires d'instances peuvent être définies en calculant une distance entre les représentations vectorielles des instances. En se basant sur la représentation vectorielle des données, aucune des distances parmi celles communément utilisées n'est assurée d'être représentative de la tâche à résoudre. Dans ce document, nous étudions l'intérêt d'adapter la représentation vectorielle des données à la distance utilisée pour une meilleure résolution de la tâche. Nous nous concentrons plus précisément sur l'algorithme existant résolvant une tâche de classification en se basant sur un graphe. Nous décrivons d'abord un algorithme apprenant une projection des données dans un espace de représentation permettant une résolution, basée sur un graphe, optimale de la classification. En projetant les données dans un espace de représentation dans lequel une distance préalablement définie est représentative de la tâche, nous pouvons surpasser la représentation vectorielle des données lors de la résolution de la tâche. Une analyse théorique de l'algorithme décrit est développée afin de définir les conditions assurant une classification optimale. Un ensemble d'expériences nous permet finalement d'évaluer l'intérêt de l'approche introduite et de nuancer l'analyse théorique. / Machine learning proposes numerous algorithms to solve the different tasks that can be extracted from real world prediction problems. To solve the different concerned tasks, most Machine learning algorithms somehow rely on relationships between instances. Pairwise instances relationships can be obtained by computing a distance between the vectorial representations of the instances. Considering the available vectorial representation of the data, none of the commonly used distances is ensured to be representative of the task that aims at being solved. In this work, we investigate the gain of tuning the vectorial representation of the data to the distance to more optimally solve the task. We more particularly focus on an existing graph-based algorithm for classification task. An algorithm to learn a mapping of the data in a representation space which allows an optimal graph-based classification is first introduced. By projecting the data in a representation space in which the predefined distance is representative of the task, we aim at outperforming the initial vectorial representation of the data when solving the task. A theoretical analysis of the introduced algorithm is performed to define the conditions ensuring an optimal classification. A set of empirical experiments allows us to evaluate the gain of the introduced approach and to temper the theoretical analysis.
|
3 |
Improving the Static Resolution of Dynamic Java FeaturesSawin, Jason E. 11 September 2009 (has links)
No description available.
|
4 |
How to explain graph-based semi-supervised learning for non-mathematicians?Jönsson, Mattias, Borg, Lucas January 2019 (has links)
Den stora mängden tillgänglig data på internet kan användas för att förbättra förutsägelser genom maskininlärning. Problemet är att sådan data ofta är i ett obehandlat format och kräver att någon manuellt bestämmer etiketter på den insamlade datan innan den kan användas av algoritmen. Semi-supervised learning (SSL) är en teknik där algoritmen använder ett fåtal förbehandlade exempel och därefter automatiskt bestämmer etiketter för resterande data. Ett tillvägagångssätt inom SSL är att representera datan i en graf, vilket kallas för graf-baserad semi-supervised learning (GSSL), och sedan hitta likheter mellan noderna i grafen för att automatiskt bestämma etiketter.Vårt mål i denna uppsatsen är att förenkla de avancerade processerna och stegen för att implementera en GSSL-algoritm. Vi kommer att gå igen grundläggande steg som hur utvecklingsmiljön ska installeras men även mer avancerade steg som data pre-processering och feature extraction. Feature extraction metoderna som uppsatsen använder sig av är bag-of-words (BOW) och term frequency-inverse document frequency (TF-IDF). Slutgiltligen presenterar vi klassificering av dokument med Label Propagation (LP) och Multinomial Naive Bayes (MNB) samt en detaljerad beskrivning över hur GSSL fungerar.Vi presenterar även prestanda för klassificering-algoritmerna genom att klassificera 20 Newsgroup datasetet med LP och MNB. Resultaten dokumenteras genom två olika utvärderingspoäng vilka är F1-score och accuracy. Vi gör även en jämförelse mellan MNB och LP med två olika typer av kärnor, KNN och RBF, på olika mängder av förbehandlade träningsdokument. Resultaten ifrån klassificering-algoritmerna visar att MNB är bättre på att klassificera datasetet än LP. / The large amount of available data on the web can be used to improve the predictions made by machine learning algorithms. The problem is that such data is often in a raw format and needs to be manually labeled by a human before it can be used by a machine learning algorithm. Semi-supervised learning (SSL) is a technique where the algorithm uses a few prepared samples to automatically prepare the rest of the data. One approach to SSL is to represent the data in a graph, also called graph-based semi-supervised learning (GSSL), and find similarities between the nodes for automatic labeling.Our goal in this thesis is to simplify the advanced processes and steps to implement a GSSL-algorithm. We will cover basic tasks such as setup of the developing environment and more advanced steps such as data preprocessing and feature extraction. The feature extraction techniques covered are bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF). Lastly, we present how to classify documents using Label Propagation (LP) and Multinomial Naive Bayes (MNB) with a detailed explanation of the inner workings of GSSL. We showcased the classification performance by classifying documents from the 20 Newsgroup dataset using LP and MNB. The results are documented using two different evaluation scores called F1-score and accuracy. A comparison between MNB and the LP-algorithm using two different types of kernels, KNN and RBF, was made on different amount of labeled documents. The results from the classification algorithms shows that MNB is better at classifying the data than LP.
|
5 |
Construção de redes baseadas em vizinhança para o aprendizado semissupervisionado / Graph construction based on neighborhood for semisupervisedBerton, Lilian 25 January 2016 (has links)
Com o aumento da capacidade de armazenamento, as bases de dados são cada vez maiores e, em muitas situações, apenas um pequeno subconjunto de itens de dados pode ser rotulado. Isto acontece devido ao processo de rotulagem ser frequentemente caro, demorado e necessitar do envolvimento de especialistas humanos. Com isso, diversos algoritmos semissupervisionados foram propostos, mostrando que é possível obter bons resultados empregando conhecimento prévio, relativo à pequena fração de dados rotulados. Dentre esses algoritmos, os que têm ganhado bastante destaque na área têm sido aqueles baseados em redes. Tal interesse, justifica-se pelas vantagens oferecidas pela representação via redes, tais como, a possibilidade de capturar a estrutura topológica dos dados, representar estruturas hierárquicas, bem como modelar manifolds no espaço multi-dimensional. No entanto, existe uma grande quantidade de dados representados em tabelas atributo-valor, nos quais não se poderia aplicar os algoritmos baseados em redes sem antes construir uma rede a partir desses dados. Como a geração das redes, assim como sua relação com o desempenho dos algoritmos têm sido pouco estudadas, esta tese investigou esses aspectos e propôs novos métodos para construção de redes, considerando características ainda não exploradas na literatura. Foram propostos três métodos para construção de redes com diferentes topologias: 1) S-kNN (Sequential k Nearest Neighbors), que gera redes regulares; 2) GBILI (Graph Based on the Informativeness of Labeled Instances) e RGCLI (Robust Graph that Considers Labeled Instances), que exploram os rótulos disponíveis gerando redes com distribuição de grau lei de potência; 3) GBLP (Graph Based on Link Prediction), que se baseia em medidas de predição de links gerando redes com propriedades mundo-pequeno. As estratégias de construção de redes propostas foram analisadas por meio de medidas de teoria dos grafos e redes complexas e validadas por meio da classificação semissupervisionada. Os métodos foram aplicados em benchmarks da área e também na classificação de gêneros musicais e segmentação de imagens. Os resultados mostram que a topologia da rede influencia diretamente os algoritmos de classificação e as estratégias propostas alcançam boa acurácia. / With the increase capacity of storage, databases are getting larger and, in many situations, only a small subset of data items can be labeled. This happens because the labeling process is often expensive, time consuming and requires the involvement of human experts. Hence, several semi-supervised algorithms have been proposed, showing that it is possible to achieve good results by using prior knowledge. Among these algorithms, those based on graphs have gained prominence in the area. Such interest is justified by the benefits provided by the representation via graphs, such as the ability to capture the topological structure of the data, represent hierarchical structures, as well as model manifold in high dimensional spaces. Nevertheless, most of available data is represented by attribute-value tables, making necessary the study of graph construction techniques in order to convert these tabular data into graphs for applying such algorithms. As the generation of the weight matrix and the sparse graph, and their relation to the performance of the algorithms have been little studied, this thesis investigated these aspects and proposed new methods for graph construction with characteristics litle explored in the literature yet. We have proposed three methods for graph construction with different topologies: 1) S-kNN (Sequential k Nearest Neighbors) that generates regular graphs; 2) GBILI (Graph Based on the informativeness of Labeled Instances) and RGCLI (Robust Graph that Considers Labeled Instances), which exploit the labels available generating power-law graphs; 3) GBLP (Graph Based on Link Prediction), which are based on link prediction measures and generates small-world graphs. The strategies proposed were analyzed by graph theory and complex networks measures and validated in semi-supervised classification tasks. The methods were applied in benchmarks of the area and also in the music genre classification and image segmentation. The results show that the topology of the graph directly affects the classification algorithms and the proposed strategies achieve good accuracy.
|
6 |
Construção de redes baseadas em vizinhança para o aprendizado semissupervisionado / Graph construction based on neighborhood for semisupervisedLilian Berton 25 January 2016 (has links)
Com o aumento da capacidade de armazenamento, as bases de dados são cada vez maiores e, em muitas situações, apenas um pequeno subconjunto de itens de dados pode ser rotulado. Isto acontece devido ao processo de rotulagem ser frequentemente caro, demorado e necessitar do envolvimento de especialistas humanos. Com isso, diversos algoritmos semissupervisionados foram propostos, mostrando que é possível obter bons resultados empregando conhecimento prévio, relativo à pequena fração de dados rotulados. Dentre esses algoritmos, os que têm ganhado bastante destaque na área têm sido aqueles baseados em redes. Tal interesse, justifica-se pelas vantagens oferecidas pela representação via redes, tais como, a possibilidade de capturar a estrutura topológica dos dados, representar estruturas hierárquicas, bem como modelar manifolds no espaço multi-dimensional. No entanto, existe uma grande quantidade de dados representados em tabelas atributo-valor, nos quais não se poderia aplicar os algoritmos baseados em redes sem antes construir uma rede a partir desses dados. Como a geração das redes, assim como sua relação com o desempenho dos algoritmos têm sido pouco estudadas, esta tese investigou esses aspectos e propôs novos métodos para construção de redes, considerando características ainda não exploradas na literatura. Foram propostos três métodos para construção de redes com diferentes topologias: 1) S-kNN (Sequential k Nearest Neighbors), que gera redes regulares; 2) GBILI (Graph Based on the Informativeness of Labeled Instances) e RGCLI (Robust Graph that Considers Labeled Instances), que exploram os rótulos disponíveis gerando redes com distribuição de grau lei de potência; 3) GBLP (Graph Based on Link Prediction), que se baseia em medidas de predição de links gerando redes com propriedades mundo-pequeno. As estratégias de construção de redes propostas foram analisadas por meio de medidas de teoria dos grafos e redes complexas e validadas por meio da classificação semissupervisionada. Os métodos foram aplicados em benchmarks da área e também na classificação de gêneros musicais e segmentação de imagens. Os resultados mostram que a topologia da rede influencia diretamente os algoritmos de classificação e as estratégias propostas alcançam boa acurácia. / With the increase capacity of storage, databases are getting larger and, in many situations, only a small subset of data items can be labeled. This happens because the labeling process is often expensive, time consuming and requires the involvement of human experts. Hence, several semi-supervised algorithms have been proposed, showing that it is possible to achieve good results by using prior knowledge. Among these algorithms, those based on graphs have gained prominence in the area. Such interest is justified by the benefits provided by the representation via graphs, such as the ability to capture the topological structure of the data, represent hierarchical structures, as well as model manifold in high dimensional spaces. Nevertheless, most of available data is represented by attribute-value tables, making necessary the study of graph construction techniques in order to convert these tabular data into graphs for applying such algorithms. As the generation of the weight matrix and the sparse graph, and their relation to the performance of the algorithms have been little studied, this thesis investigated these aspects and proposed new methods for graph construction with characteristics litle explored in the literature yet. We have proposed three methods for graph construction with different topologies: 1) S-kNN (Sequential k Nearest Neighbors) that generates regular graphs; 2) GBILI (Graph Based on the informativeness of Labeled Instances) and RGCLI (Robust Graph that Considers Labeled Instances), which exploit the labels available generating power-law graphs; 3) GBLP (Graph Based on Link Prediction), which are based on link prediction measures and generates small-world graphs. The strategies proposed were analyzed by graph theory and complex networks measures and validated in semi-supervised classification tasks. The methods were applied in benchmarks of the area and also in the music genre classification and image segmentation. The results show that the topology of the graph directly affects the classification algorithms and the proposed strategies achieve good accuracy.
|
7 |
CONNECTING THE DOTS : Exploring gene contexts through knowledge-graph representations of gene-information derived from scientific literatureHellberg, Henrietta January 2023 (has links)
Analyzing the data produced by next-generation sequencing technologies relies on access to information synthesized based on previous research findings. The volume of data available in the literature is growing rapidly, and it is becoming increasingly necessary for researchers to use AI or other statistics-based approaches in the analysis of their datasets. In this project, knowledge graphs are explored as a tool for providing access to contextual gene-information available in scientific literature. The explorative method described in this thesis is based on the implementation and comparison of two approaches for knowledge graph construction, a rule-based statistical as well as a neural-network and co-occurrence based approach, -based on specific literature contexts. The results are presented both in the form of a quantitative comparison between approaches as well as in the form of a qualitative expert evaluation of the quantitative result. The quantitative comparison suggested that contrasting knowledge graphs constructed based on different approaches can provide valuable information for the interpretation and contextualization of key genes. It also demonstrated the limitations of some approaches e.g. in terms of scalability as well as the volume and type of information that can be extracted. The result further suggested that metrics based on the overlap of nodes and edges, as well as metrics that leverage the global topology of graphs are valuable for representing and comparing contextual information between knowledge graphs. The result based on the qualitative expert evaluation demonstrated that literature-derived knowledge graphs of gene-information can be valuable tools for identifying research biases related to genes and also shed light on the challenges related to biological entity normalization in the context of knowledge graph development. In light of these findings, automatic knowledge-graph construction presents as a promising approach for improving access to contextual information about genes in scientific literature. / För att analysera de stora mängder data som produceras med hjälp av next-generation sequencing krävs det att forskare har tillgång till och kan sammanställa information från tidigare forskning. I takt med att mängden data som finns tillgänglig i den vetenskapliga litteraturen ökar, så ökar även behovet av att använda AI och andra statistiska metoder för att få tillgång till denna data i analysen. I detta projekt utforskas kunskapsgrafer som verktyg för att tillgängliggöra kontextuell geninformation i vetenskapliga artiklar. Den explorativa metod som beskrivs i detta projekt är baserad på implementationen och jämförelsen av två olika tekniker för kunskapsgrafgenerering, en regelbaserad-statistisk metod samt en metod baserad på neurala-nätverk och co-occurrence, baserade på specifika kontexter inom litteraturen. Resultatet presenteras både i form av en kvantitativ jämförelse mellan metoder samt genom en kvalitativ expertutvärdering baserad på det kvantitativa resultatet. Den kvantitativa jämförelsen antydde att jämförelsen mellan kunskapsgrafer genererade med hjälp av olika metoder kan bidra med värdefull information för tolkningen och kontextualiseringen av viktiga gener. Resultatet visade även på begränsningar hos vissa metoder, till exempel gällande skalbarhet samt den mängd och typ av information som kan extraheras. Men även att metrics baserade på överlappning av hörn och kanter, samt metrics som tar hänsyn till den globala topologin i grafer kan vara användbara i jämförelsen av, samt för att representera skillnader mellan biologiska kunskapsgrafer. Resultatet från den kvalitativa expertutvärderingen visade att kunskapsgrafer baserade på geninformation extraherad från vetenskapliga artiklar kan vara värdefulla verktyg för att identifiera forskningsbias gällande gener, samt framhävde viktiga utmaningar gällande normalisering av biologiska entiteter inom området kunskapsgrafsutveckling. Baserat på dessa fynd framstår automatisk kunskapsgrafsgenerering som ett lovande tillvägagångssätt för att förbättra tillgängligheten av kontextuell geninformation i vetenskaplig litteratur.
|
8 |
Experimental Studies On A New Class Of Combinatorial LDPC CodesDang, Rajdeep Singh 05 1900 (has links)
We implement a package for the construction of a new class of Low Density Parity Check (LDPC) codes based on a new random high girth graph construction technique, and study the performance of the codes so constructed on both the Additive White Gaussian Noise (AWGN) channel as well as the Binary Erasure Channel (BEC). Our codes are “near regular”, meaning thereby that the the left degree of any node in the Tanner graph constructed varies by at most 1 from the average left degree and so also the right degree. The simulations for rate half codes indicate that the codes perform better than both the regular Progressive Edge Growth (PEG) codes which are constructed using a similar random technique, as well as the MacKay random codes. For high rates the ARG (Almost Regular high Girth) codes perform better than the PEG codes at low to medium SNR’s but the PEG codes seem to do better at high SNR’s. We have tried to track both near codewords as well as small weight codewords for these codes to examine the performance at high rates. For the binary erasure channel the performance of the ARG codes is better than that of the PEG codes. We have also proposed a modification of the sum-product decoding algorithm, where a quantity called the “node credibility” is used to appropriately process messages to check nodes. This technique substantially reduces the error rates at signal to noise ratios of 2.5dB and beyond for the codes experimented on. The average number of iterations to achieve this improved performance is practically the same as that for the traditional sum-product algorithm.
|
Page generated in 0.0866 seconds