Global ETD Search

71	Classification d’images à partir d’une annotation implicite par le regard / Content based images retrieval based on implicit gaze annotations Lopez, Stéphanie 04 December 2017 (has links) Un défi pour les systèmes de recherche basée sur le contenu réside dans la nécessité d’avoir une base annotée. Cette thèse propose un système d’annotation d’images interactif par le regard afin d’alléger la tâche d’annotation. Le but est de classer un petit ensemble d’images en fonction d’une catégorie cible (classification binaire) pour classer un grand ensemble d’images. Parmi les caractéristiques du regard pointées comme informatives sur l’intention des utilisateurs, nous avons élaboré un estimateur d’intention par le regard, calculable en temps réel, indépendant de l’utilisateur et de la catégorie cible. Cette annotation implicite est meilleure qu’une annotation aléatoire mais reste incertaine. Dans une deuxième partie, les images ainsi annotées sont utilisées pour classifier un plus grand ensemble d’images avec un algorithme prenant en compte l’incertitude des labels : P-SVM combinant classification et régression. Nous avons déterminé parmi différentes stratégies un critère de pertinence pour discriminer les labels les plus fiables, utilisés pour la classification, des labels les plus incertains, utilisés pour la régression. La précision du P-SVM est évaluée dans différents contextes et peut atteindre les performances d’un algorithme de classification standard entraîné avec les labels certains. Ces évaluations ont tout d’abord été menées sur un benchmark standard pour se comparer à l’état de l’art, et dans un second temps, sur une base d’images de nourriture. / One daunting challenge of Content Based Image Retrieval systems is the requirement of annotated databases. To limit the burden of annotation, this thesis proposes a system of image annotation based on gaze data. The purpose is to classify a small set of images according to a target category (binary classification) in order to classify a set of unseen images. First, we have designed a protocol based on visual preference paradigm in order to collect gaze data from different groups of participants during a category identification task. Among the gaze features known to be informative about the intentions of the participants, we have determined a Gaze-Based Intention Estimator (GBIE), computable in real-time; independent from both the participant and the target category. This implicit annotation is better than random annotation but is inherently uncertain. In a second part, the images annotated by the GBIE from the participants’ gaze data are used to classify a bigger set of images with an algorithm that handles label uncertainty: P-SM combining classification and regression SVM. We have determined among different strategies a criterion of relevance in order to discriminate the most reliable labels, involved in the classification part, from the most uncertain labels, involved in the regression part. The average accuracy of P-SVM is evaluated in different contexts and can compete with the performances of standard classification algorithm trained with true-class labels. These evaluations were first conducted on a standard benchmark for comparing with state-of-the-art results and later conducted on food image dataset. Regard Annotation implicite CBIR Labels incertains Gaze based implicit annotation Uncertain labels CBIR
72	Flexible Integration of Molecular-Biological Annotation Data: The GenMapper Approach Do, Hong-Hai, Rahm, Erhard 12 December 2018 (has links) Molecular-biological annotation data is continuously being collected, curated and made accessible in numerous public data sources. Integration of this data is a major challenge in bioinformatics. We present the GenMapper system that physically integrates heterogeneous annotation data in a flexible way and supports large-scale analysis on the integrated data. It uses a generic data model to uniformly represent different kinds of annotations originating from different data sources. Existing associations between objects, which represent valuable biological knowledge, are explicitly utilized to drive data integration and combine annotation knowledge from different sources. To serve specific analysis needs, powerful operators are provided to derive tailored annotation views from the generic data representation. GenMapper is operational and has been successfully used for large-scale functional profiling of genes.
73	Combining machine learning and evolution for the annotation of metagenomics data / La combinaison de l'apprentissage statistique et de l'évolution pour l'annotation des données métagénomiques Ugarte, Ari 16 December 2016 (has links) La métagénomique sert à étudier les communautés microbiennes en analysant de l’ADN extrait directement d’échantillons pris dans la nature, elle permet également d’établir un catalogue très étendu des gènes présents dans les communautés microbiennes. Ce catalogue doit être comparé contre les gènes déjà référencés dans les bases des données afin de retrouver des séquences similaires et ainsi déterminer la fonction des séquences qui le composent. Au cours de cette thèse, nous avons développé MetaCLADE, une nouvelle méthodologie qui améliore la détection des domaines protéiques déjà référencés pour des séquences issues des données métagénomiques et métatranscriptomiques. Pour le développement de MetaCLADE, nous avons modifié un système d’annotations de domaines protéiques qui a été développé au sein du Laboratoire de Biologie Computationnelle et Quantitative appelé CLADE (CLoser sequences for Annotations Directed by Evolution) [17]. En général les méthodes pour l’annotation de domaines protéiques caractérisent les domaines connus avec des modèles probabilistes. Ces modèles probabilistes, appelés Sequence Consensus Models (SCMs) sont construits à partir d’un alignement des séquences homologues appartenant à différents clades phylogénétiques et ils représentent le consensus à chaque position de l’alignement. Cependant, quand les séquences qui forment l’ensemble des homologues sont très divergentes, les signaux des SCMs deviennent trop faibles pour être identifiés et donc l’annotation échoue. Afin de résoudre ce problème d’annotation de domaines très divergents, nous avons utilisé une approche fondée sur l’observation que beaucoup de contraintes fonctionnelles et structurelles d’une protéine ne sont pas globalement conservées parmi toutes les espèces, mais elles peuvent être conservées localement dans des clades. L’approche consiste donc à élargir le catalogue de modèles probabilistes en créant de nouveaux modèles qui mettent l’accent sur les caractéristiques propres à chaque clade. MetaCLADE, un outil conçu dans l’objectif d’annoter avec précision des séquences issues des expériences métagénomiques et métatranscriptomiques utilise cette libraire afin de trouver des correspondances entre les modèles et une base de données de séquences métagénomiques ou métatranscriptomiques. En suite, il se sert d’une étape pré-calculée pour le filtrage des séquences qui permet de déterminer la probabilité qu’une prédiction soit considérée vraie. Cette étape pré-calculée est un processus d’apprentissage qui prend en compte la fragmentation de séquences métagénomiques pour les classer.Nous avons montré que l’approche multi source en combinaison avec une stratégie de méta apprentissage prenant en compte la fragmentation atteint une très haute performance. / Metagenomics is used to study microbial communities by the analyze of DNA extracted directly from environmental samples. It allows to establish a catalog very extended of genes present in the microbial communities. This catalog must be compared against the genes already referenced in the databases in order to find similar sequences and thus determine their function. In the course of this thesis, we have developed MetaCLADE, a new methodology that improves the detection of protein domains already referenced for metagenomic and metatranscriptomic sequences. For the development of MetaCLADE, we modified an annotation system of protein domains that has been developed within the Laboratory of Computational and Quantitative Biology clade called (closer sequences for Annotations Directed by Evolution) [17]. In general, the methods for the annotation of protein domains characterize protein domains with probabilistic models. These probabilistic models, called sequence consensus models (SCMs) are built from the alignment of homolog sequences belonging to different phylogenetic clades and they represent the consensus at each position of the alignment. However, when the sequences that form the homolog set are very divergent, the signals of the SCMs become too weak to be identified and therefore the annotation fails. In order to solve this problem of annotation of very divergent domains, we used an approach based on the observation that many of the functional and structural constraints in a protein are not broadly conserved among all species, but they can be found locally in the clades. The approach is therefore to expand the catalog of probabilistic models by creating new models that focus on the specific characteristics of each clade. MetaCLADE, a tool designed with the objective of annotate with precision sequences coming from metagenomics and metatranscriptomics studies uses this library in order to find matches between the models and a database of metagenomic or metatranscriptomic sequences. Then, it uses a pre-computed step for the filtering of the sequences which determine the probability that a prediction is a true hit. This pre-calculated step is a learning process that takes into account the fragmentation of metagenomic sequences to classify them. We have shown that the approach multi source in combination with a strategy of meta-learning taking into account the fragmentation outperforms current methods. Métagénomique Métatranscriptomique Annotation de domaine Apprentissage statistique Annotation de protéine Modèle probabiliste Metagenomics Metatranscriptomics Probabilistic models 004
74	Méthode automatique d’annotations sémantiques et indexation de documents textuels pour l’extraction d’objets pédagogiques / Automatic method of semantic annotation and indexing of textual documents to extract learning objects Ben Ali, Boutheina 18 January 2014 (has links) L'analyse du contenu devient une nécessité pour l'accès et l'utilisation de l'information en particulier dans le domaine de la didactique des disciplines. Nous proposons un système SRIDOP d'annotations sémantiques et d'indexation des documents pédagogiques à partir des annotations, en se basant sur la méthode d'Exploration Contextuelle qui, à un identificateur linguistique d'un concept, associe une annotation d'un segment en tenant compte d'indices contextuels gérés par des règles. SRIDOP est composé de quatre modules consécutifs : (1)Segmentation automatique des documents en paragraphes et phrases ; (2) annotation selon différents points de vue de fouille (exemple: identification de définitions, exemples, exercices, etc.) en se basant sur une ontologie linguistique de concepts associés à un point de vue de fouille (carte sémantique) et de ressources linguistiques (indicateurs de concepts, indices linguistiques et règles d'Exploration Contextuelle) ; (3) extraction d'objets pédagogiques ; (4) constitution de fiches pédagogiques exploitables par les utilisateurs. SRIDOP est évalué et comparé à d'autres systèmes. / Content analysis is a need for access and use of information especially in the field of didactics. We propose a system SRIDOP of semantic annotations and indexing of learning objects from these annotations, based on the Contextual Exploration method, that associate annotation of a segment to a linguistic identifier of a concept, taking into account contextual clues managed by rules. SRIDOP is composed of four consecutive modules: (1) Automatic segmentation of documents into paragraphs and sentences; (2) annotation from different points of view of search (eg identification of definitions, examples, exercises, etc..) based on a linguistic ontology of concepts associated with a point of view of search (semantic map) and linguistic resources (indicators of concepts, linguistic clues and contextual exploration rules); (3) extraction of learning objects, (4) establishment of learning sheets exploitable by users. SRIDOP is evaluated and compared to other systems. Objet pédagogique Annotation automatique Indexation par annotation Exploration contextuelle Fiche pédagogique Évaluation d'un système d'annotation Learning objects Automatic annotation Indexation based on annotation Contextual exploration Learning sheets Annotation system evaluation 004
75	Framework für Ingest mit Annotation technischer Randbedingungen / A Framework For Media Ingestion - Adding Data About Technical Constraints Herms, Robert, Manthey, Robert, Eibl, Maximilian 25 January 2013 (has links) (PDF) Dieser Artikel stellt ein Framework zur Generierung von Metadaten der technischen Randbedingungen eines Ingests vor, welches an der Professur Medieninformatik im Rahmen des Projektes ValidAX zur Digitalisierung verschiedener Videokassettenformate entwickelt wurde. Insbesondere werden hierbei die Architektur und der Einsatz näher beleuchtet. / The process of introducing media into an IT-based system during acquisition is called ingest. The appropriate handling of media requires the extraction of additional metadata being realized by automatic extraction and analysis as well as manual annotation. We assume, that metadata about technical constraints of the ingest process itself implies a benefit for the media lifecycle. In this context the challenge is the automation. Retrieval Annotation Metadaten Multimedia Workflow Framework Archive Management Retrieval Annotation Metadata Multimedia Workflow Framework Archive Management ddc:004 Annotation Metadaten Multimedia Archiv Management
76	A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration Khalili, Ali 26 January 2015 (has links) The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years. However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos. This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information. Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information. Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization. Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata. Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content. In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users. By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces. We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content. To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications. These use cases address four aspects of the WYSIWYM implementation: 1) Its integration into existing user interfaces, 2) Utilizing it for lightweight text analytics to incentivize users, 3) Dealing with crowdsourcing of semi-structured e-learning content, 4) Incorporating it for authoring of semantic medical prescriptions. info:eu-repo/classification/ddc/500 ddc:500
77	Framework für Ingest mit Annotation technischer Randbedingungen Herms, Robert, Manthey, Robert, Eibl, Maximilian 25 January 2013 (has links) Dieser Artikel stellt ein Framework zur Generierung von Metadaten der technischen Randbedingungen eines Ingests vor, welches an der Professur Medieninformatik im Rahmen des Projektes ValidAX zur Digitalisierung verschiedener Videokassettenformate entwickelt wurde. Insbesondere werden hierbei die Architektur und der Einsatz näher beleuchtet. / The process of introducing media into an IT-based system during acquisition is called ingest. The appropriate handling of media requires the extraction of additional metadata being realized by automatic extraction and analysis as well as manual annotation. We assume, that metadata about technical constraints of the ingest process itself implies a benefit for the media lifecycle. In this context the challenge is the automation. info:eu-repo/classification/ddc/004 ddc:004
78	Design of Business Process Model Repositories : Requirements, Semantic Annotation Model and Relationship Meta-model Elias, Mturi January 2015 (has links) Business process management is fast becoming one of the most important approaches for designing contemporary organizations and information systems. A critical component of business process management is business process modelling. It is widely accepted that modelling of business processes from scratch is a complex, time-consuming and error-prone task. However the efforts made to model these processes are seldom reused beyond their original purpose. Reuse of business process models has the potential to overcome the challenges of modelling business processes from scratch. Process model repositories, properly populated, are certainly a step toward supporting reuse of process models. This thesis starts with the observation that the existing process model repositories for supporting process model reuse suffer from several shortcomings that affect their usability in practice. Firstly, most of the existing repositories are proprietary, therefore they can only be enhanced or extended with new models by the owners of the repositories. Secondly, it is difficult to locate and retrieve relevant process models from a large collection. Thirdly, process models are not goal related, thereby making it difficult to gain an understanding of the business goals that are realized by a certain model. Finally, process model repositories lack a clear mechanism to identify and define the relationship between business processes and as a result it is difficult to identify related processes. Following a design science research paradigm, this thesis proposes an open and language-independent process model repository with an efficient retrieval system to support process model reuse. The proposed repository is grounded on four original and interrelated contributions: (1) a set of requirements that a process model repository should possess to increase the probability of process model reuse; (2) a context-based process semantic annotation model for semantically annotating process models to facilitate effective retrieval of process models; (3) a business process relationship meta-model for identifying and defining the relationship of process models in the repository; and (4) architecture of a process model repository for process model reuse. The models and architecture produced in this thesis were evaluated to test their utility, quality and efficacy. The semantic annotation model was evaluated through two empirical studies using controlled experiments. The conclusion drawn from the two studies is that the annotation model improves searching, navigation and understanding of process models. The process relationship meta-model was evaluated using an informed argument to determine the extent to which it meets the established requirements. The results of the analysis revealed that the meta-model meets the established requirements. Also the analysis of the architecture against the requirements indicates that the architecture meets the established requirements. / Processhantering, också kallat ärendehantering, har blivit en av de viktigaste ansatserna för att utforma dagens organisationer och informationssystem. En central komponent i processhantering är processmodellering. Det är allmänt känt att modellering av processer kan vara en komplex, tidskrävande och felbenägen uppgift. Och de insatser som görs för att modellera processer kan sällan användas bortom processernas ursprungliga syfte. Återanvändning av processmodeller skulle kunna övervinna många av de utmaningar som finns med att modellera processer. En katalog över processmodeller är ett steg mot att stödja återanvändning av processmodeller. Denna avhandling börjar med observationen att befintliga processmodellkataloger för att stödja återanvändning av processmodeller lider av flera brister som påverkar deras användbarhet i praktiken. För det första är de flesta processmodellkatalogerna proprietära, och därför kan endast katalogägarna förbättra eller utöka dem med nya modeller. För det andra är det svårt att finna och hämta relevanta processmodeller från en stor katalog. För det tredje är processmodeller inte målrelaterade, vilket gör det svårt att få en förståelse för de affärsmål som realiseras av en viss modell. Slutligen så saknar processmodellkataloger ofta en tydlig mekanism för att identifiera och definiera förhållandet mellan processer, och därför är det svårt att identifiera relaterade processer. Utifrån ett designvetenskapligt forskningsparadigm så föreslår denna avhandling en öppen och språkoberoende processmodellkatalog med ett effektivt söksystem för att stödja återanvändning av processmodeller. Den föreslagna katalogen bygger på fyra originella och inbördes relaterade bidrag: (1) en uppsättning krav som en processmodellkatalog bejöver uppfylla för att öka möjligheterna till återanvändning av processmodeller; (2) en kontextbaserad semantisk processannoteringsmodell för semantisk annotering av processmodeller för att underlätta effektivt återvinnande av processmodeller; (3) en metamodell för processrelationer för att identifiera och definiera förhållandet mellan processmodeller i katalogen; och (4) en arkitektur av en processmodellkatalog för återanvändning av processmodeller. De modeller och den arkitektur som tagits fram i denna avhandling har utvärderats för att testa deras användbarhet, kvalitet och effektivitet. Den semantiska annotationsmodellen utvärderades genom två empiriska studier med kontrollerade experiment. Slutsatsen av de två studierna är att modellen förbättrar sökning, navigering och förståelse för processmodeller. Metamodellen för processrelationer utvärderades med hjälp av ett informerat argument för att avgöra i vilken utsträckning den uppfyllde de ställda kraven. Resultaten av analysen visade att metamodellen uppfyllde dessa krav. Även analysen av arkitekturen indikerade att denna uppfyllde de fastställda kraven. business process business process model business process repository semantic annotation
79	GestAnnot: A Paper Annotation Tool for Tablet Singh, Varinder 12 December 2013 (has links) Active Reading is an important part of a knowledge worker’s activities; it involves highlighting, writing notes, marking with symbols, etc., on a document. Many Active Reading applications have been designed in seeking to replicate the affordances of paper through digital-ink-based annotation tools. However, these applications require users to perform numerous steps to use various types of annotation tools, which impose an unnecessary cognitive load, distracting them from their reading tasks. In this thesis, we introduce GestAnnot, an Active Reading application for tablet computers that takes a fundamentally different approach of incorporating multi-touch gesture techniques for creating and manipulating annotations on an e-document, thus offering a flexible and easy- to-use annotation solution. Based on the literature review, we designed and developed GestAnnot and then performed lab and field evaluations of the software. In lab evaluation, GestAnnot performed better than one of the best existing annotation application in many aspects, including number of steps. The design was then refined based on the feedback received. The field evaluation of the improved design helped us to understand the performance of the application in the real world. We proposed a set of design guidelines through the feedback received from both evaluations, which any future Active Reading application could benefit from. Annotation Active Reading Human Computer Interaction HCI Gesture Tablet
80	Parallelizing support vector machines for scalable image annotation Alham, Nasullah Khalid January 2011 (has links) Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced. The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers. The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications. 006.3

Search results