1 |
Enabling information-centric networking : architecture, protocols, and applicationsCho, Tae Won, 1978- 23 November 2010 (has links)
As the Internet is becoming information-centric, network services increasingly demand scalable and efficient communication of information between a multitude of information producers and large groups of interested information consumers. Such information-centric services are growing rapidly in use and deployment. Examples of deployed services that are information-centric include: IPTV, MMORPG, VoD, video conferencing, file sharing, software updates, RSS dissemination, online markets, and grid computing. To effectively support future information-centric services, the network infrastructure for multi-point communication has to address a number of significant challenges: (i) how to understand massive information-centric groups in a scalable manner, (ii) how to analyze and predict the evolution of those groups in an accurate and efficient way, and (iii) how to disseminate content from information producers to a vast number of groups with potentially long-lived membership and highly diverse, dynamic group activity levels? This dissertation proposes novel architecture and protocols that effectively address the above challenges in supporting multi-point communication for future information-centric network services. In doing so, we make the following three major contributions: (1) We develop a novel technique called Proximity Embedding (PE) that can approximate a family of path-ensembled based proximity measures for information-centric groups. We develop Clustered Spectral Graph Embedding (SCGE) that captures the essential structure of large graphs in a highly efficient and scalable manner. Our techniques help to explain the proximity (closeness) of users in information-centric groups, and can be applied to a variety of analysis tasks of complex network structures. (2) Based on SCGE, we develop new supervision based link prediction techniques called Clustered Spectral Learning and Clustered Polynomial Learning that enable us to predict the evolution of massive and complex network structures in an accurate and efficient way. By exploiting supervised information from past snapshots of network structures, our methods yield up to 20% improvement in link prediction accuracy when compared to existing state-of-the-art methods. (3) Finally, we develop a novel multicast infrastructure called Multicast with Adaptive Dual-state (MAD). MAD supports large number of group and group membership, and efficient content dissemination in a presence of dynamic group activity. We demonstrate the effectiveness of our approach in extensive simulation, analysis, and emulation through the real system implementation. / text
|
2 |
Estudo de coeficientes de correlação para medidas de proximidade em dados de expressão gênica / A study of correlation coefficients as proximity measures for gene expression dataJaskowiak, Pablo Andretta 02 March 2011 (has links)
O desenvolvimento da tecnologia de microarray tornou possível a mediçao dos níveis de expressão de centenas ou até mesmo milhares de genes simultaneamente para diversas condições experimentais. A grande quantidade de dados disponível gerou a demanda por métodos computacionais que permitam sua análise de forma eficiente e automatizada. Em muitos dos métodos computacionais empregados durante a análise de dados de expressão gênica é necessária a escolha de uma medida de proximidade apropriada entre genes ou amostras. Dentre as medidas de proximidade disponíveis, coeficientes de correlação têm sido amplamente empregados, em virtude da sua capacidade em capturar similaridades entre tendências das sequências numéricas comparadas (genes ou amostras). O presente trabalho possui como objetivo comparar diferentes medidas de correlação para as três principais tarefas envolvidas na análise de dados de expressão gênica: agrupamento, seleção de atributos e classificação. Dessa forma, é apresentada nesta dissertação uma visão geral da análise de dados de expressão gênica e das diferentes medidas de correlação consideradas para tal comparação. São apresentados também resultados empíricos obtidos a partir da comparação dos coeficientes de correlação para agrupamento de genes, agrupamento de amostras, seleção de genes para o problema de classificação de amostras e classificação de amostras / The development of microarray technology made possible the expression level measurement of hundreds or even thousands of genes simultaneously for various experimental conditions. The huge amount of available data generated the need for computational methods that allow its analysis in an effcient and automated way. In many of the computational methods employed during gene expression data analysis the choice of a proximity measure is necessary. Among the proximity measures available, correlation coefficients have been widely employed because of their ability to capture similarity trends among the compared numeric sequences (genes or samples). The present work has as objective to compare different correlation measures for the three major tasks involved in the analysis of gene expression data: clustering, feature selection and classification. To this extent, in this dissertation an overview of gene expression data analysis and the different correlation measures considered for this comparison are presented. In the present work are also presented empirical results obtained from the comparison of correlation coefficients for gene clustering, sample clustering, gene selection for sample classification and sample classification
|
3 |
Estudo de coeficientes de correlação para medidas de proximidade em dados de expressão gênica / A study of correlation coefficients as proximity measures for gene expression dataPablo Andretta Jaskowiak 02 March 2011 (has links)
O desenvolvimento da tecnologia de microarray tornou possível a mediçao dos níveis de expressão de centenas ou até mesmo milhares de genes simultaneamente para diversas condições experimentais. A grande quantidade de dados disponível gerou a demanda por métodos computacionais que permitam sua análise de forma eficiente e automatizada. Em muitos dos métodos computacionais empregados durante a análise de dados de expressão gênica é necessária a escolha de uma medida de proximidade apropriada entre genes ou amostras. Dentre as medidas de proximidade disponíveis, coeficientes de correlação têm sido amplamente empregados, em virtude da sua capacidade em capturar similaridades entre tendências das sequências numéricas comparadas (genes ou amostras). O presente trabalho possui como objetivo comparar diferentes medidas de correlação para as três principais tarefas envolvidas na análise de dados de expressão gênica: agrupamento, seleção de atributos e classificação. Dessa forma, é apresentada nesta dissertação uma visão geral da análise de dados de expressão gênica e das diferentes medidas de correlação consideradas para tal comparação. São apresentados também resultados empíricos obtidos a partir da comparação dos coeficientes de correlação para agrupamento de genes, agrupamento de amostras, seleção de genes para o problema de classificação de amostras e classificação de amostras / The development of microarray technology made possible the expression level measurement of hundreds or even thousands of genes simultaneously for various experimental conditions. The huge amount of available data generated the need for computational methods that allow its analysis in an effcient and automated way. In many of the computational methods employed during gene expression data analysis the choice of a proximity measure is necessary. Among the proximity measures available, correlation coefficients have been widely employed because of their ability to capture similarity trends among the compared numeric sequences (genes or samples). The present work has as objective to compare different correlation measures for the three major tasks involved in the analysis of gene expression data: clustering, feature selection and classification. To this extent, in this dissertation an overview of gene expression data analysis and the different correlation measures considered for this comparison are presented. In the present work are also presented empirical results obtained from the comparison of correlation coefficients for gene clustering, sample clustering, gene selection for sample classification and sample classification
|
4 |
Automatic methods for assisted recruitment / Méthodes automatiques pour l'aide au recrutementCabrera Diego, Luis Adrian 09 December 2015 (has links)
L'utilisation massive de l'Internet et les ordinateurs ont changé plusieurs aspects de notre vie quotidienne et la façon que nous postulons pour un travail n'y fait pas exception. Aujourd'hui, le recrutement et sélection des candidats pour un poste se font en utilisant les technologies de l'information. Ceci a créé le recrutement électronique. Depuis les 15 dernières années, les chercheurs du Traitement de la Langue Naturelle ont étudié comment améliorer les performances des recruteurs avec l'aide du recrutement électronique. Beaucoup de systèmes ont été développés dans ce domaine, depuis les moteurs de recherche de candidats ou de postes jusqu'au classement automatique de candidats. Dans ce dernier cas, les systèmes développés font, pour la plupart, la comparaison entre les CV des candidats et les offres d'emploi. Seul un système utilise les CV de processus de sélection relevant du passé pour classer les candidats à un nouveau poste. Dans le cadre de cette thèse, nous avons étudié la possibilité et la façon d'utiliser les CV, sans avoir à exploiter aucun processus de sélection précédent, pour développer nouvelles méthodes applicables aux systèmes de recrutement électronique. Plus spécifiquement, nous commençons par le traitement automatique d'un grand ensemble de CV utilisés pendant des processus réels de recrutement et sélection. Ensuite, nous analysons et appliquons différentes mesures de proximité pour savoir lesquelles sont les plus appropriées pour étudier les CV des candidats. Après, nous introduisons une méthode innovante qui repose sur le Relevance Feedback et l'utilisation de mesures de proximité seulement sur les CV pour pouvoir classer les candidats d'un poste. Finalement, nous présentons l'étude et l'application d' une mesure statistique permettant de comparer, en même temps, l'offre d'emploi, un certain candidat et les autres candidats ; le but est de pouvoir classer tous les candidats d'un poste. Dans cette thèse, nous montrons que les CV contiennent assez d'information sur le processus de sélection pour pouvoir classer les candidats. Néanmoins, il est important de choisir correctement les mesures de proximité à utiliser. D'ailleurs, nous présentons des résultats intéressants de la triple comparaison entre les CV et les offres d'emploi. Les résultats obtenus dans cette thèse forment une base pour la conception de nouveaux prototypes de systèmes de recrutement électronique et possiblement le début d'une nouvelle façon pour les développer. / The massification of the Internet and computers has changed several aspects of our daily life and the way we apply to a job is not the exception. Nowadays, the recruitment and selection of applicants for a job is done through the use of information technologies, creating what it is known as e-Recruitment. Since the last 15 years, the researchers in Natural Language Processing have been studying how to improve the performance of recruiters with the help of the e-Recruitment. Several systems have been developed in this field, from the job and applicants search engines to the automatic ranking of applicants. In the last case, most of the developed systems consist in the comparison between the résumés of applicants and a job offer. Only one system makes use of résumés from past selection processes to rank newer applicants.In this thesis we study whether and how we can use the résumés, without having to use past selection processes, to develop new methods for e-Recruitment systems. More specifically, we start with the automatic treatment of a large set of résumés used during real recruitment and selection processes. Then, we analyze and apply different proximity measures to know which are the most adequate to study the résumés of applicants. We introduce, after, an innovative method which consists on the Relevance Feedback and the use of proximity measures over uniquely the résumés to rank applicants. Finally, we present the study and application of a statistical measure which allows us to compare, at the same time, the job offer, one specific applicant and the rest of applicants, in order to rank all the job candidates. Along this thesis we show that résumés have enough information about the selection processes, in order to rank the applicants. Nonetheless, it is important to choose correctly the proximity measure to use. As well, we present interesting outcomes from the triple comparison between résumés and job offers.The results obtained in this thesis are the basis for a new prototype of an e-Recruitment system and hopefully, the beginning of a new way to create these.
|
5 |
Modul shlukové analýzy systému pro dolování z dat / Cluster Analysis Module of a Data Mining SystemRiedl, Pavel January 2010 (has links)
This master's thesis deals with development of a module for a data mining system, which is being developed on FIT. The first part describes the general knowledge discovery process and cluster analysis including cluster validation; it also describes Oracle Data Mining including algorithms, which it uses for clustering. At the end it deals with the system and the technologies it uses, such as NetBeans Platform and DMSL. The second part describes design of a clustering module and a module used to compare its results. It also deals with visualization of cluster analysis results and shows the achievements.
|
6 |
Contribution à la sélection de variables par les machines à vecteurs support pour la discrimination multi-classes / Contribution to Variables Selection by Support Vector Machines for Multiclass DiscriminationAazi, Fatima Zahra 20 December 2016 (has links)
Les avancées technologiques ont permis le stockage de grandes masses de données en termes de taille (nombre d’observations) et de dimensions (nombre de variables).Ces données nécessitent de nouvelles méthodes, notamment en modélisation prédictive (data science ou science des données), de traitement statistique adaptées à leurs caractéristiques. Dans le cadre de cette thèse, nous nous intéressons plus particulièrement aux données dont le nombre de variables est élevé comparé au nombre d’observations.Pour ces données, une réduction du nombre de variables initiales, donc de dimensions, par la sélection d’un sous-ensemble optimal, s’avère nécessaire, voire indispensable.Elle permet de réduire la complexité, de comprendre la structure des données et d’améliorer l’interprétation des résultats et les performances du modèle de prédiction ou de classement en éliminant les variables bruit et/ou redondantes.Nous nous intéressons plus précisément à la sélection de variables dans le cadre de l’apprentissage supervisé et plus spécifiquement de la discrimination à catégories multiples dite multi-classes. L’objectif est de proposer de nouvelles méthodes de sélection de variables pour les modèles de discrimination multi-classes appelés Machines à Vecteurs Support Multiclasses (MSVM).Deux approches sont proposées dans ce travail. La première, présentée dans un contexte classique, consiste à sélectionner le sous-ensemble optimal de variables en utilisant le critère de "la borne rayon marge" majorante du risque de généralisation des MSVM. Quant à la deuxième approche, elle s’inscrit dans un contexte topologique et utilise la notion de graphes de voisinage et le critère de degré d’équivalence topologique en discrimination pour identifier les variables pertinentes qui constituent le sous-ensemble optimal du modèle MSVM.L’évaluation de ces deux approches sur des données simulées et d’autres réelles montre qu’elles permettent de sélectionner, à partir d’un grand nombre de variables initiales, un nombre réduit de variables explicatives avec des performances similaires ou encore meilleures que celles obtenues par des méthodes concurrentes. / The technological progress has allowed the storage of large amounts of data in terms of size (number of observations) and dimensions (number of variables). These data require new methods, especially for predictive modeling (data science), of statistical processing adapted to their characteristics. In this thesis, we are particularly interested in the data with large numberof variables compared to the number of observations.For these data, reducing the number of initial variables, hence dimensions, by selecting an optimal subset is necessary, even imperative. It reduces the complexity, helps to understand the data structure, improves the interpretation of the results and especially enhances the performance of the forecasting model by eliminating redundant and / or noise variables.More precisely, we are interested in the selection of variables in the context of supervised learning, specifically of multiclass discrimination. The objective is to propose some new methods of variable selection for multiclass discriminant models called Multiclass Support Vector Machines (MSVM).Two approaches are proposed in this work. The first one, presented in a classical context, consist in selecting the optimal subset of variables using the radius margin upper bound of the generalization error of MSVM. The second one, proposed in a topological context, uses the concepts of neighborhood graphs and the degree of topological equivalence in discriminationto identify the relevant variables and to select the optimal subset for an MSVM model.The evaluation of these two approaches on simulated and real data shows that they can select from a large number of initial variables, a reduced number providing equal or better performance than those obtained by competing methods.
|
Page generated in 0.0852 seconds