Global ETD Search

21	Semi-supervised clustering in graphs / Partitionnement semi-supervisé dans les graphes Chatel, David 07 December 2017 (has links) Le partitionnement consiste à rechercher une partition d'éléments, de sorte que les éléments d'un même cluster soient plus similaires que les éléments de différents clusters. Les données proviennent de différentes sources et prennent des formes différentes. L'un des défis consiste à concevoir un système capable de tirer parti des différentes sources de données. Certaines contraintes peuvent être connues sur les données. On peut savoir qu'un objet est d'un certain type ou que deux objets partagent le même type ou sont de types différents. On peut également savoir qu'à l'échelle globale, les différents types d'objets apparaissent avec une fréquence connue. Dans cette thèse, nous nous concentrons sur le partitionnement avec trois types de contraintes: les contraintes d'étiquettes, les contraintes de paires et les contraintes de lois de puissance. Une contrainte d'étiquette spécifie dans quel cluster appartient un objet. Les contraintes par paire spécifient que les paires d'objets doivent ou ne doivent pas partager le même cluster. Enfin, la contrainte de loi de puissance est une contrainte globale qui spécifie que la distribution des tailles de cluster est soumise à une loi de puissance. Nous voulons montrer que l'introduction de la semi-supervision aux algorithmes de clustering peut modifier et améliorer les solutions retournées par des algorithmes de clustering non supervisés. Nous contribuons à cette question en proposant des algorithmes pour chaque type de contraintes. Nos expériences sur les ensembles de données UCI et les jeux de données en langage naturel montrent la bonne performance de nos algorithmes et donnent des indications pour des travaux futurs prometteurs. / Clustering is the task of finding a partition of items, such that items in the same cluster are more similar than items in different clusters. One challenge consists in designing a system capable of taking benefit of the different sources of data. Among the different forms a piece of data can take, the description of an object can take the form of a feature vector: a list of attributes that takes a value. Objects can also be described by a graph which captures the relationships objects have with each others. In addition to this, some constraints can be known about the data. It can be known that an object is of a certain type or that two objects share the same type or are of different types. It can also be known that on a global scale, the different types of objects appear with a known frequency. In this thesis, we focus on clustering with three different types of constraints: label constraints, pairwise constraints and power-law constraint. A label constraint specifies in which cluster an object belong. Pairwise constraints specify that pairs of object should or should not share the same cluster. Finally, the power-law constraint is a cluster-level constraint that specifies that the distribution of cluster sizes are subject to a power-law. We want to show that introducing semi-supervision to clustering algorithms can alter and improve the solutions returned by unsupervised clustering algorithms. We contribute to this question by proposing algorithms for each type of constraints. Our experiments on UCI data sets and natural language processing data sets show the good performance of our algorithms and give hints towards promising future works. Semi-Supervisé Contraintes de paires Contraintes de loi de puissance Contraintes de label 005.741
22	Compression et indexation de séquences annotées / Compressing and indexing labeled sequences Rocher, Tatiana 12 February 2018 (has links) Cette thèse en algorithmique du texte étudie la compression, l'indexation et les requêtes sur un texte annoté. Un texte annoté est un texte sur lequel nous ajoutons des informations. Ce peut être par exemple une recombinaison V(D)J, un marqueur de globules blancs, où le texte est une séquence ADN et les annotations sont des noms de gènes. Le système immunitaire d'une personne se représente par un ensemble de recombinaisons V(D)J. Avec le séquençage à haut débit, on peut avoir accès à des millions de recombinaisons V(D)J qui sont stockées et doivent pouvoir être retrouvées et comparées rapidement.La première contribution de ce manuscrit est une méthode de compression d'un texte annoté qui repose sur le principe du stockage par références. Le texte est découpé en facteurs pointant vers les séquences annotées déjà connues. La seconde contribution propose deux index pour un texte annoté. Ils utilisent une transformée de Burrows-Wheeler indexant le texte ainsi qu'un Wavelet Tree stockant les annotations. Ces index permettent des requêtes efficaces sur le texte, les annotations ou les deux. Nous souhaitons à terme utiliser l'un de ces index pour indexer des recombinaisons V(D)J obtenues dans des services d'hématologie lors du diagnostic et du suivi de patients atteints de leucémie. / This thesis in text algorithm studies the compression, indexation and querying on a labeled text. A labeled text is a text to which we add information. For example: a V(D)J recombination, a marker for lymphocytes, where the text is a DNA sequence and the labels are the genes' names. A person's immune system can be represented with a set of V(D)J recombinations. With high-throughput sequencing, we have access to millions of V(D)J recombinations which are stored and need to be recovered and compared quickly.The first contribution of this manuscript is a compression method for a labeled text which uses the concept of storage by references. The text is divided into sections which point to pre-established labeled sequences. The second contribution offers two indexes for a labeled text. Both use a Burrows-Wheeler transform to index the text and a Wavelet Tree to index the labels. These indexes allow efficient queries on text, labels or both. We would like to use one of these indexes on V(D)J recombinations which are obtained with hematology services from the diagnostic or follow-up of patients suffering from leukemia. Algorithmique du texte Indexation de texte Transformée de Burrows-Wheeler Wavelet Tree 005.741
23	Négociation multi-agents pour la réallocation dynamique de tâches et application au patron de conception MapReduce / Multi-agent negotiation for dynamic task reallocation and application to the MapReduce design pattern Baert, Quentin 13 September 2019 (has links) Le problème Rm\|\|Cmax consiste à allouer un ensemble de tâches à m agents de sorte à minimiser le makespan de l’allocation, c’est-à-dire le temps d’exécution de l’ensemble des tâches. Ce problème est connu pour être NP-dur dès que les tâches sont allouées à deux agents ou plus (m ≥ 2). De plus, il est souvent admis que le coût d’une tâche est précisément estimé pour un agent et que ce coût ne varie pas au cours de l’exécution des tâches. Dans cette thèse, je propose une approche décentralisée et dynamique pour l’amélioration d’une allocation de tâches. Ainsi, à partir d’une allocation initiale et pendant qu’ils exécutent les tâches, les agents collaboratifs initient de multiples enchères pour réallouer les tâches qui restent à exécuter. Ces réallocations sont socialement rationnelles, c’est-à-dire qu’un agent accepte de prendre en charge une tâche initialement allouée à un autre agent si la délégation de cette tâche bénéficie à l’ensemble du système en faisant décroître le makespan. De plus, le dynamisme du procédé permet d’améliorer une allocation malgré une fonction de coût peu précise et malgré les variations de performances qui peuvent survenir lors de l’exécution des tâches. Cette thèse offre un cadre formel pour la modélisation et la résolution multi-agents d’un problème de réallocation de tâches situées. Dans un tel problème, la localité des ressources nécessaires à l’exécution d’une tâche influe sur son coût pour chaque agent du système. À partir de ce cadre, je présente le protocole d’interaction des agents et je propose plusieurs stratégies pour que les choix des agents aient le plus d’impact sur le makespan de l’allocation courante. Dans le cadre applicatif de cette thèse, je propose d’utiliser ce processus de réallocation de tâches pour améliorer le patron de conception MapReduce. Très utilisé pour le traitement distribué de données massives, MapReduce possède néanmoins des biais que la réallocation dynamique des tâches peut aider à contrer. J’ai donc implémenté un prototype distribué qui s’inscrit dans le cadre formel et implémente le patron de conception MapReduce. Grâce à ce prototype, je suis en mesure d’évaluer l’apport du processus de réallocation et l’impact des différentes stratégies d’agent. / The Rm\|\|Cmax problem consists in allocating a set of tasks to m agents in order to minimize the makespan of the allocation, i.e. the execution time of all the tasks. This problem is known to be NP-hard as soon as the tasks are allocated to two or more agents (m ≥ 2). In addition, it is often assumed that the cost of a task is accurately estimated for an agent and that this cost does not change during the execution of tasks. In this thesis, I propose a decentralized and dynamic approach to improve the allocation of tasks. Thus, from an initial allocation and while they are executing tasks, collaborative agents initiate multiple auctions to reallocate the remaining tasks to be performed. These reallocations are socially rational, i.e. an agent agrees to take on a task initially allocated to another agent if the delegation of this task benefits to the entire system by decreasing the makespan. In addition, the dynamism of the process makes it possible to improve an allocation despite an inaccurate cost function and despite the variations of performance that can occur during the execution of tasks. This thesis provides a formal framework for multi-agent modeling and multi-agent resolution of a located tasks reallocation problem. In such a problem, the locality of the resources required to perform a task affects its cost for each agent of the system. From this framework, I present the interaction protocol used by the agents and I propose several strategies to ensure that the choices of agents have the greatest impact on the makespan of the current allocation. In the applicative context of this thesis, I propose to use this tasks reallocation process to improve the MapReduce design pattern. Widely used for the distributed processing of massive data, MapReduce has biases that the dynamic tasks reallocation process can help to counter. I implemented a distributed prototype that fits into the formal framework and implements the MapReduce design pattern. Thanks to this prototype, I am able to evaluate the effectiveness of the reallocation process and the impact of the different agent strategies. Systèmes multi-Agents Négociation multi-Agents Réallocation de tâches MapReduce 005.741
24	Αξιοποίηση υπολογιστικών πόρων Σίψας, Κωνσταντίνος 13 December 2010 (has links) Στα πλαίσια αυτής της εργασίας θα εξετάσουμε την δυνατότητα αξιοποίησης της μονάδας επεξεργασίας γραφικών (GPU) για την εκτέλεση ενός αλγορίθμου πολλαπλασιασμού πίνακα-διανύσματος και τριών αλγορίθμων ταξινόμησης και το κατά πόσο είναι δυνατό να επιταχυνθεί η εκτέλεση του κώδικα αυτού. Η αρχιτεκτονική που μελετήθηκε και αναλύεται στην εργασία ονομάζεται Tesla και αναπτύχθηκε από την εταιρία Nvidia, το μοντέλο και το περιβάλλον ανάπτυξης ονομάζονται Cuda (Compute Unified Device Architecture). / In context of this diploma thesis the capability of exploiting the graphics processing unit (GPU) to execute and accelerate an algorithm for matrix vector multiplication and three sorting algorithms was examined. The architecture which was examined and described in this diploma thesis is Tesla and it was created by Nvidia. The CUDA (Compute Unified Device Architecture) programming environment was used to implement the algorithms. 005.741 Parallel sorting Tesla
25	Προσομοίωση αλγορίθμων διάταξης με εκπαιδευτικό ρομπότ Πουρνάρας, Απόστολος 25 January 2012 (has links) Στη διπλωματική αυτή, παρουσιάζεται μια ρομποτική κατασκευή για την επίδειξη αλγορίθμων ταξινόμησης, με χρήση του εκπαιδευτικού ρομπότ της Lego, το LEGO Mindstorm NXT. Σκοπός αυτής τη επίδειξης είναι να βοηθήσει τους φοιτητές που την παρακολουθούν να κατανοήσουν καλύτερα τους τρόπους εκτέλεσης των αλγορίθμων ταξινόμησης. Το εκπαιδευτικό ρομπότ αυτό αποτελεί εμπορικό προϊόν, μη έχοντας όμως συγκεκριμένη μορφή. Αποτελείται από πολλά πλαστικά μέρη, τα οποία θυμίζουν τα κλασικά τουβλάκια της LEGO αλλά και πολλά άλλα όπως αισθητήρες, κινητήρες, γρανάζια και ρόδες. Με τη χρήση αυτών, κατασκευάστηκε ένα όχημα, το οποίο μπορεί να κινείται μόνο αριστερά-δεξιά, στο οποίο και προσαρτάται ένας αισθητήρας φωτεινότητας. Διαθέτει ακόμη έναν βραχίονα που μπορεί να κινηθεί πάνω-κάτω και στον οποίο προσαρτάται ένας αισθητήρας χρώματος. Οι αριθμοί που καλείται το ρομπότ να ταξινομήσει είναι στην ουσία κύβοι. Οι κύβοι αυτοί, είναι χρωματισμένοι στο επάνω μέρος τους με κάποιο χρώμα ενώ στην πρόσοψή τους έχει εκτυπωθεί ένας αριθμός. Το ρομπότ αναλαμβάνει να αναγνωρίσει με τον αισθητήρα χρώματος το χρώμα του κάθε κύβου και να το ταυτοποιήσει με τον αριθμό στο οποίο αντιστοιχίζεται το χρώμα αυτό. Τον αριθμό δηλαδή που είναι εκτυπωμένος στη πρόσοψη. Για την πλοήγηση του οχήματος εφαρμόζεται μια παραλλαγή της τοπολογικής πλοήγησης. Για την αντιστοίχιση των χρωμάτων με τους αριθμούς χρησιμοποιείται δειγματοληψία χρώματος και στη συνέχεια χρησιμοποιείται 1-προς-1 αντιστοίχιση χρώματος και κατάλληλου αριθμού. Τέλος, οι αλγόριθμοι ταξινόμησης που υλοποιήθηκαν ήταν οι Bubble Sort, Insertion Sort, Heap Sort, Quick Sort. Η επίδειξη των αλγορίθμων γίνεται χρησιμοποιώντας φυσικά τον βραχίονα ο οποίος μετακινεί κατάλληλα τους κύβους. Όμως για την καλλίτερη κατανόηση και για να βοηθηθούν όσοι παρακολουθούν την επίδειξη, παράλληλα της ταξινόμησης με τον βραχίονα, γίνεται χρήση κατάλληλων ηχητικών αλλά και γραπτών μηνυμάτων τα οποία προβάλλονται στην οθόνη που διαθέτει το ΝΧΤ. Τα όσα προβάλλονται στην οθόνη, χρησιμοποιώντας το προγραμματιστικό περιβάλλον Bricx, είναι δυνατόν να προβληθούν σε οθόνη υπολογιστή ή ακόμα και μέσω προβολέα εφόσον ο τελευταίος συνδέεται με υπολογιστή. Τέλος, θεωρούμε ότι το σύστημα που αναπτύχθηκε αποτελεί ένα πολύ καλό εργαλείο που μπορεί να βοηθήσει τον διδάσκοντα στη διδασκαλία των αλγορίθμων ταξινόμησης. Οι φοιτητές μπορούν μέσω της οπτικοποίησης να κατανοήσουν ευκολότερα και γρηγορότερα τους αλγορίθμους. Μελλοντικά ίσως προστεθούν και άλλοι αλγόριθμοι ταξινόμησης, να αναπτυχθεί μια γραφική διεπαφή που θα είναι ανεξάρτητη του Bricx για να προβάλλονται σε κάποια οθόνη τα όσα προβάλλονται χρησιμοποιώντας το Bricx, να χρησιμοποιηθούν διαφορετικοί τρόποι αναγνώρισης αριθμών όπως χρήση αλγορίθμων μορφολογικής επεξεργασίας και τέλος η βηματική ταξινόμηση των αλγορίθμων από κάποιον χειριστή. / -- Εκπαιδευτικά ρομπότ 005.741 Sorting algorithms Educational robots LEGO Mindstorm NXT
26	Appariement de graphes & [et] optimisation dynamique par colonies de fourmis / Graph matching and dynamic optimization by ant colonies Sammoud Aouf, Olfa 21 May 2010 (has links) Cette thèse s’intéresse à une problématique ayant de nombreuses applications pratiques, à savoir la comparaison automatique d’objets et l’évaluation de la similarité. Lorsque les objets sont modélisés par des graphes, ce problème de comparaison automatique d’objets se ramène à un problème d’appariement de graphes, c’est-à-dire, chercher une mise en correspondance entre les sommets des graphes permettant de retrouver le plus grand nombre de caractéristiques communes. Différentes classes existent allant de la plus restrictive à la plus générale. Dans la plus restrictive isomorphisme de (sous-) graphes, il s’agit de chercher un appariement exact entre les sommets des graphes de manière à prouver que les deux graphes possèdent une structure identique ou que l’un d’eux est inclus dans l’autre, un sommet étant apparié avec au plus un sommet. Dans la plus générale (appariement multivoque), l’objectif n’est plus de trouver un appariement exact mais le meilleur appariement, c’est-à-dire, celui qui préserve un maximum de sommets et d’arcs, un sommet pouvant être apparié à un ensemble de sommets. Nous nous intéressons au problème de la recherche du meilleur appariement multivoque, ce problème étant plus général que les problèmes d’appariement restrictifs. Sa résolution est clairement un défi tant par la difficulté du problème que par l’importance de ses applications. Pour relever ce défi, nous proposons d’étudier les capacités de l’optimisation par colonies de fourmis (ACO). Notre étude est menée dans deux contextes : un contexte statique, où le problème est figé, et un contexte dynamique, où les graphes à comparer, les contraintes à respecter ainsi que les critères définissant la qualité des appariements changent régulièrement de sorte que la solution doit être dynamiquement adaptée. Un premier objectif, de cette thèse, est de proposer l’algorithme ACO générique pour la résolution des problèmes d’appariement de graphes. Plusieurs points clés sont étudiés dans cet algorithme, à savoir : l’influence des paramètres sur la qualité des solutions construites, l’influence de la stratégie phéromonale et du facteur heuristique, et l’hybridation avec une technique de recherche locale. Un deuxième objectif est de proposer un algorithme ACO générique pour résoudre des problèmes d’optimisation dynamiques. L’algorithme proposé est appliqué et expérimenté à quelques problèmes dynamiques, à savoir : l’appariement de graphes, le problème du sac à dos multidimensionnel, et le voyageur de commerce / The thesis addresses the problematic of comparing objects and similarity measuring. If objects are described by graphs, so that measuring objects similarity turns into determining graph similarity, i.e., matching graph vertices to identify their common features and their differences. Different classes of graph matching have been proposed going on the most restrictive ones to the most general. In restrictive graph matching (graph or sub-graph isomorphism), the objective is to show graph equivalence or inclusion, a vertex in a graph may be matched with one vertex at most on the other graph. In general graph matching (multivalent matching), the goal is not yet to find an “exact” matching (a matching which preserves all vertices and edges), but to look for a “best” matching (a matching which preserves a maximum number of vertices and edges), a vertex in one graph may be matched with a set of vertices in the other graph. In our work, we consider the problem of searching the best multivalent matching which is a NP-hard optimization problem. More precisely, we propose to investigate the ability if the ant colony optimization meta-heuristic (ACO). Two cases are considered in our study: the static case where the problem remains invariant through time and the dynamic case where graphs to compare constrained to satisfy and the criterions defining matching quality may change over the time, so that solutions must be dynamically adapted to the changes. A first goal of this thesis is to propose a generic ACO algorithm for solving graph matching problems. Different key points, like the pheromonal strategy to be used, the heuristic factor and the use of a local search procedure, are studied. A second goal of this work is to propose a generic ACO algorithm for solving dynamic optimization problems. The proposed algorithm will be applied and experimentally studied on three different dynamic problems: graph matching problem, multi-dimensional knapsack problem and the travelling salesman problem Appariement de graphes Similarité Optimisation dynamique Optimisation par colonies de fourmis Graph matching Similarity Dynamic optimization Ant colonies optimization 005.741
27	Extension automatique de l'annotation d'images pour la recherche et la classification / Automatic image annotation extension for search and classification Bouzayani, Abdessalem 09 May 2018 (has links) Cette thèse traite le problème d’extension d’annotation d’images. En effet, la croissance rapide des archives de contenus visuels disponibles a engendré un besoin en techniques d’indexation et de recherche d’information multimédia. L’annotation d’images permet l’indexation et la recherche dans des grandes collections d’images d’une façon facile et rapide. À partir de bases d’images partiellement annotées manuellement, nous souhaitons compléter les annotations de ces bases, grâce à l’annotation automatique, pour pouvoir rendre plus efficaces les méthodes de recherche et/ou classification d’images. Pour l’extension automatique d’annotation d’images, nous avons utilisé les modèles graphiques probabilistes. Le modèle proposé est un mélange de distributions multinomiales et de mélanges de Gaussiennes où nous avons combiné des caractéristiques visuelles et textuelles. Pour réduire le coût de l’annotation manuelle et améliorer la qualité de l’annotation obtenue, nous avons intégré des retours utilisateur dans notre modèle. Les retours utilisateur ont été effectués en utilisant l’apprentissage dans l’apprentissage, l’apprentissage incrémental et l’apprentissage actif. Pour combler le problème du fossé sémantique et enrichir l’annotation d’images, nous avons utilisé une hiérarchie sémantique en modélisant de nombreuses relations sémantiques entre les mots-clés d’annotation. Nous avons donc présenté une méthode semi-automatique pour construire une hiérarchie sémantique à partie d’un ensemble de mots-clés. Après la construction de la hiérarchie, nous l’avons intégré dans notre modèle d’annotation d’images. Le modèle obtenu avec la hiérarchie est un mélange de distributions de Bernoulli et de mélanges de Gaussiennes / This thesis deals the problem of image annotation extension. Indeed, the fast growth of available visual contents has led a need for indexing and searching of multimedia information methods. Image annotation allows indexing and searching in a large collection of images in an easy and fast way. We wish, from partially manually annotated images databases, complete automatically the annotation of these sets, in order to make methods of research and / or classification of images more efficient. For automatic image annotation extension, we use probabilistic graphical models. The proposed model is based on a mixture of multinomial distributions and mixtures of Gaussian where we have combined visual and textual characteristics. To reduce the cost of manual annotation and improve the quality of the annotation obtained, we have incorporated user feedback into our model. User feedback was done using learning in learning, incremental learning and active learning. To reduce the semantic gap problem and to enrich the image annotation, we use a semantic hierarchy by modeling many semantic relationships between keywords. We present a semi-automatic method to build a semantic hierarchy from a set of keywords. After building the hierarchy, we integrate it into our image annotation model. The model obtained with this hierarchy is a mixture of Bernoulli distributions and Gaussian mixtures Annotation d’images Extension d’annotation Retour utilisateur Hiérarchie sémantique Image annotation Annotation extension User feedback Semantic hierarchy 005.741
28	Prise en compte métrologique de la couleur dans un contexte de classification et d'indexation / Taking metrologically into account colour for classification and image retrieval Chatoux, Hermine 21 May 2019 (has links) Cette thèse aborde la question du traitement correct et complet de la couleur selon les contraintes métrologiques. Le manque d’approches adaptées a justifié la reformulation principaux outils de traitement d’images que sont le gradient, la détection et la description de points d’intérêt. Les approches proposées sont génériques : indépendantes du nombre de canaux d’acquisition (de la couleur à l’hyper-spectral), de la plage spectrale considérée et prenant en compte les courbes de sensibilité spectrales du capteur ou de l’œil.Le full-vector gradient nait de cet objectif métrologique. La preuve de concept est effectuée sur des images couleurs, multi et hyper-spectrales. L’extension développée pour l’analyse de la déficience visuelle ouvre également de nombreuses s perspectives intéressantes pour l’analyse du système visuel humain. Ce gradient est au cœur de la proposition d’un détecteur de points d’intérêt, lui aussi générique. Nous montrons la nécessité d’un choix mathématiquement valide de la distance entre attributs et l’importance de la cohérence de la paire attribut/distance. Une paire attribut/distance complète l’ensemble.Pour chaque développement, nous proposons des protocoles objectifs de validation liés à des générateurs d’images de synthèse explorant toute la complexité spatio-chromatique possible. Notre hypothèse est que la difficulté d’extraction du gradient/des points d’intérêts… est liée à la complexité de discrimination des distributions couleur dans la zone de traitement. Une confrontation aux approches courantes du domaine a été également mise en œuvre. / The PhD thesis objective is to study a colour’s correct and complete processing, respecting metrological constraint. The lack of compatible approaches justified that we reformulate the main image processing tools that are gradient, key point detector and descriptor. The proposed approaches are generic: channel count independent and taking the sensor’s or eye’s sensitivity curves into account.The full-vector gradient is born from this metrological objective. Proof of concept was realised on colour, multi and hyper-spectral images. The extension developed for human vision deficiency opens interesting perspectives to study of the human vision system. This gradient is the centre of the key point detector proposition, also generic.We also showed how necessary was a mathematically valid choice of distance between features. We revealed the importance of the pair feature/distance and completed the work with a pair: RC2O/Kulback-Leibler divergence based on colour differences.For each development, we propose unbiased validation protocols linked to synthetic images generators exploring the most spatial-chromatic complexity possible. Our hypothesis being that the extraction difficulty comes from the discrimination complexity between colour distributions in the processing area. We also compared our proposition to state of the art approaches in recurring datasets/protocols. Couleur Spectral Gradient Détecteur Descripteur Matrice de Gram Divergence de Kullback-Leibler Colour Spectral Gradient Detector Descriptor Gram matrix Kulback-Leibler divergence 005.741
29	Ανάπτυξη συστήματος συστάσεων συνεργατικής διήθησης με χρήση ιεραρχικών αλγορίθμων κατάταξης Κουνέλη, Μαριάννα 01 February 2013 (has links) Σκοπός της παρούσας διπλωματικής διατριβής είναι η μελέτη και ανάπτυξη ενός νέου αλγοριθμικού πλαισίου Συνεργατικής Διήθησης(CF) για την παραγωγή συστάσεων. Η μέθοδος που προτείνουμε, βασίζεται στην εκμετάλλευση της ιεραρχικής διάρθρωσης του χώρου αντικειμένων και πατά διαισθητικά στην ιδιότητα της ``Σχεδόν Πλήρης Αναλυσιμότητας'' (NCD) η οποία είναι συνυφασμένη με τη δομή της πλειοψηφίας των ιεραρχικών συστημάτων. Η Συνεργατική Διήθηση αποτελεί ίσως την πιο πετυχημένη οικογένεια τεχνικών για την παραγωγή συστάσεων. Η μεγάλη απήχησή της στο διαδίκτυο αλλά και η ευρεία εφαρμογή της σε σημαντικά εμπορικά περιβάλλοντα, έχουν οδηγήσει στη σημαντική ανάπτυξη της θεωρίας την τελευταία δεκαετία, όπου μια ευρεία ποικιλία αλγορίθμων και μεθόδων έχουν προταθεί. Ωστόσο, παρά την πρωτοφανή τους επιτυχία οι CF μέθοδοι παρουσιάζουν κάποιους σημαντικούς περιορισμούς συμπεριλαμβανομένης της επεκτασιμότητας και της αραιότητας των δεδομένων. Τα προβλήματα αυτά επιδρούν αρνητικά στην ποιότητα των παραγόμενων συστάσεων και διακυβεύουν την εφαρμοσιμότητα πολλών CF αλγορίθμων σε ρεαλιστικά σενάρια. Χτίζοντας πάνω στη διαίσθηση πίσω από τον αλγόριθμο NCDawareRank - μίας γενικής μεθόδου υπολογισμού διανυσμάτων κατάταξης ιεραρχικά δομημένων γράφων - και της σχετικής με αυτόν έννοιας της NCD εγγύτητας, προβαίνουμε σε μία μοντελοποίηση του συστήματος με τρόπο που φωτίζει τα ενδημικά του χαρακτηριστικά και προτείνουμε έναν νέο αλγοριθμικό πλαίσιο συστάσεων, τον Αλγόριθμο 1. Στο επίκεντρο της προσέγγισής μας είναι η προσπάθεια να συνδυάσουμε τις άμεσες με τις NCD, ``γειτονιές'' των αντικειμένων ώστε να πετύχουμε μεγαλύτερης ακρίβειας χαρακτηρισμό των πραγματικών συσχετισμών μεταξύ των στοιχείων του χώρου αντικειμένων, με σκοπό την βελτίωση της ποιότητας των συστάσεων αλλά και την αντιμετώπιση της εγγενούς αραιότητας και των προβλημάτων που αυτή συνεπάγεται. Για να αξιολογήσουμε την απόδοση της μεθόδου μας υλοποιούμε και εφαρμόζουμε τον Αλγόριθμο 1 στο κλασικό movie recommendation πρόβλημα και παραθέτουμε μια σειρά από πειράματα χρησιμοποιώντας τo MovieLens Dataset. Τα πειράματά μας δείχνουν πως ο Αλγόριθμος 1 με την εκμετάλλευση της ιδέας της NCD εγγύτητας καταφέρνει να πετύχει λίστες συστάσεων υψηλότερης ποιότητας σε σύγκριση με τις άλλες state-of-the-art μεθόδους που έχουν προταθεί στη βιβλιογραφία, σε ευρέως χρησιμοποιούμενες μετρικές (micro- και macro-DOA), αποδεικνύοντας την ίδια στιγμή πως είναι λιγότερο επιρρεπής στα προβλήματα που σχετίζονται με την αραιότητα και έχοντας παράλληλα ανταγωνιστικό προφίλ πολυπλοκότητας και απαιτήσεις αποθήκευσης. / The purpose of this master's thesis is to study and develop a new algorithmic framework for collaborative filtering (CF) to generate recommendations. The method we propose is based on the exploitation of the hierarchical structure of the item space and intuitively ``stands'' on the property of Near Complete Decomposability (NCD) which is inherent in the structure of the majority of hierarchical systems. Collaborative Filtering is one of the most successful families of recommendations methods. The great impact of CF on Web applications, and its wide deployment in important commercial environments, have led to the significant development of the theory, with a wide variety of algorithms and methods being proposed. However, despite their unprecedented success, CF methods present some important limitations including scalability and data sparsity. These problems have a negative impact of the quality of the recommendations and jeopardize the applicability of many CF algorithms in realistic scenarios. Building on the intuition behind the NCDawareRank algorithm and its related concept of NCD proximity, we model our system in a way that illuminates its endemic characteristics and we propose a new algorithmic framework for recommendations, called Algorithm 1. We focus on combining the direct with the NCD `` neighborhoods'' of items to achieve better characterization of the inter-item relations, in order to improve the quality of recommendations and alleviate sparsity related problems. To evaluate the merits of our method, we implement and apply Algorithm 1 in the classic movie recommendation problem, running a number of experiments on the standard MovieLens dataset. Our experiments show that Algorithm 1 manages to create recommendation lists with higher quality compared with other state-of-the-art methods proposed in the literature, in widely used metrics (micro- and macro- DOA), demonstrating at the same time that it is less prone to low density related problems being at the same time very efficient in both complexity and storage requirements. Συστήματα συστάσεων Συνεργατική διήθηση Αραιότητα Αλγόριθμοι κατάταξης Πειράματα 005.741 Recommender systems Collaborative filtering Sparsity Near complete decomposability Ranking algorithms Experiments
30	Les collections volumineuses de documents audiovisuels : segmentation et regroupement en locuteurs / Speaker diarization : the voluminous collections of audiovisual recordings Dupuy, Grégor 03 July 2015 (has links) La tâche de Segmentation et Regroupement en Locuteurs (SRL), telle que définie par le NIST, considère le traitement des enregistrements d’un corpus comme des problèmes indépendants. Les enregistrements sont traités séparément, et le tauxd’erreur global sur le corpus correspond finalement à une moyenne pondérée. Dans ce contexte, les locuteurs détectés par le système sont identifiés par des étiquettes anonymes propres à chaque enregistrement. Un même locuteur qui interviendrait dans plusieurs enregistrements sera donc identifié par des étiquettes différentes selon les enregistrements. Cette situation est pourtant très fréquente dans les émissions journalistiques d’information : les présentateurs, les journalistes et autres invités qui animent une émission interviennent généralement de manière récurrente. En conséquence, la tâche de SRL a depuis peu été considérée dans un contexte plus large, où les locuteurs récurrents doivent être identifiés de manière unique dans tous les enregistrements qui composent un corpus. Cette généralisation du problème de regroupement en locuteurs va de pair avec l’émergence du concept de collection, qui se réfère, dans le cadre de la SRL, à un ensemble d’enregistrements ayant une ou plusieurs caractéristiques communes. Le travail proposé dans cette thèse concerne le regroupement en locuteurs sur des collections de documents audiovisuels volumineuses (plusieurs dizaines d’heures d’enregistrements). L’objectif principal est de proposer (ou adapter) des approches de regroupement afin de traiter efficacement de gros volumes de données, tout en détectant les locuteurs récurrents. L’efficacité des approches proposées est étudiée sous deux aspects : d’une part, la qualité des segmentations produites (en termes de taux d’erreur), et d’autre part, la durée nécessaire pour effectuer les traitements. Nous proposons à cet effet deux architectures adaptées au regroupement en locuteurs sur des collections de documents. Nous proposons une approche de simplification où le problème de regroupement est représenté par une graphe non-orienté. La décompositionde ce graphe en composantes connexes permet de décomposer le problème de regroupement en un certain nombre de sous-problèmes indépendants. La résolution de ces sous-problèmes de regroupement est expérimentée avec deux approches de regroupements différentes (HAC et ILP) tirant parti des récentes avancées en modélisation du locuteur (i-vector et PLDA). / The task of speaker diarization, as defined by NIST, considers the recordings from a corpus as independent processes. The recordings are processed separately, and the overall error rate is a weighted average. In this context, detected speakers are identified by anonymous labels specific to each recording. Therefore, a speaker appearing in several recordings will be identified by a different label in each of the recordings. Yet, this situation is very common in broadcast news data: hosts, journalists and other guests may appear recurrently. Consequently, speaker diarization has been recently considered in a broader context, where recurring speakers must be uniquely identified in every recording that compose a corpus. This generalization of the speaker partitioning problem goes hand in hand with the emergence of the concept of collections, which refers, in the context of speaker diarization, to a set of recordings sharing one or more common characteristics.The work proposed in this thesis concerns speaker clustering of large audiovisual collections (several tens of hours of recordings). The main objective is to propose (or adapt) clustering approaches in order to efficiently process large volumes of data, while detecting recurrent speakers. The effectiveness of the proposed approaches is discussed from two point of view: first, the quality of the produced clustering (in terms of error rate), and secondly, the time required to perform the process. For this purpose, we propose two architectures designed to perform cross-show speaker diarization with collections of recordings. We propose a simplifying approach to decomposing a large clustering problem in several independent sub-problems. Solving these sub-problems is done with either of two clustering approaches which takeadvantage of the recent advances in speaker modeling. Collections de documents audiovisuels Traitement automatique de la parole Speaker Diarization Audiovisual recording collections Unsupervised clustering algorithms Automatic sprech processing 005.741

Search results