Spelling suggestions: "subject:"annotations"" "subject:"innotations""
51 |
Evaluation of clusterings of gene expression dataLubovac, Zelmina January 2000 (has links)
<p>Recent literature has investigated the use of different clustering techniques for analysis of gene expression data. For example, self-organizing maps (SOMs) have been used to identify gene clusters of clear biological relevance in human hematopoietic differentiation and the yeast cell cycle (Tamayo et al., 1999). Hierarchical clustering has also been proposed for identifying clusters of genes that share common roles in cellular processes (Eisen et al., 1998; Michaels et al., 1998; Wen et al., 1998). Systematic evaluation of clustering results is as important as generating the clusters. However, this is a difficult task, which is often overlooked in gene expression studies. Several gene expression studies claim success of the clustering algorithm without showing a validation of complete clusterings, for example Ben-Dor and Yakhini (1999) and Törönen et al. (1999).</p><p>In this dissertation we propose an evaluation approach based on a relative entropy measure that uses additional knowledge about genes (gene annotations) besides the gene expression data. More specifically, we use gene annotations in the form of an enzyme classification hierarchy, to evaluate clusterings. This classification is based on the main chemical reactions that are catalysed by enzymes. Furthermore, we evaluate clusterings with pure statistical measures of cluster validity (compactness and isolation).</p><p>The experiments include applying two types of clustering methods (SOMs and hierarchical clustering) on a data set for which good annotation is available, so that the results can be partly validated from the viewpoint of biological relevance.</p><p>The evaluation of the clusters indicates that clusters obtained from hierarchical average linkage clustering have much higher relative entropy values and lower compactness and isolation compared to SOM clusters. Clusters with high relative entropy often contain enzymes that are involved in the same enzymatic activity. On the other hand, the compactness and isolation measures do not seem to be reliable for evaluation of clustering results.</p>
|
52 |
Easing information extraction on the web through automated rules discoveryOrtona, Stefano January 2016 (has links)
The advent of the era of big data on the Web has made automatic web information extraction an essential tool in data acquisition processes. Unfortunately, automated solutions are in most cases more error prone than those created by humans, resulting in dirty and erroneous data. Automatic repair and cleaning of the extracted data is thus a necessary complement to information extraction on the Web. This thesis investigates the problem of inducing cleaning rules on web extracted data in order to (i) repair and align the data w.r.t. an original target schema, (ii) produce repairs that are as generic as possible such that different instances can benefit from them. The problem is addressed from three different angles: replace cross-site redundancy with an ensemble of entity recognisers; produce general repairs that can be encoded in the extraction process; and exploit entity-wide relations to infer common knowledge on extracted data. First, we present ROSeAnn, an unsupervised approach to integrate semantic annotators and produce a unied and consistent annotation layer on top of them. Both the diversity in vocabulary and widely varying accuracy justify the need for middleware that reconciles different annotator opinions. Considering annotators as "black-boxes" that do not require per-domain supervision allows us to recognise semantically related content in web extracted data in a scalable way. Second, we show in WADaR how annotators can be used to discover rules to repair web extracted data. We study the problem of computing joint repairs for web data extraction programs and their extracted data, providing an approximate solution that requires no per-source supervision and proves effective across a wide variety of domains and sources. The proposed solution is effective not only in repairing the extracted data, but also in encoding such repairs in the original extraction process. Third, we investigate how relationships among entities can be exploited to discover inconsistencies and additional information. We present RuDiK, a disk-based scalable solution to discover first-order logic rules over RDF knowledge bases built from web sources. We present an approach that does not limit its search space to rules that rely on "positive" relationships between entities, as in the case with traditional mining of constraints. On the contrary, it extends the search space to also discover negative rules, i.e., patterns that lead to contradictions in the data.
|
53 |
Evaluation of clusterings of gene expression dataLubovac, Zelmina January 2000 (has links)
Recent literature has investigated the use of different clustering techniques for analysis of gene expression data. For example, self-organizing maps (SOMs) have been used to identify gene clusters of clear biological relevance in human hematopoietic differentiation and the yeast cell cycle (Tamayo et al., 1999). Hierarchical clustering has also been proposed for identifying clusters of genes that share common roles in cellular processes (Eisen et al., 1998; Michaels et al., 1998; Wen et al., 1998). Systematic evaluation of clustering results is as important as generating the clusters. However, this is a difficult task, which is often overlooked in gene expression studies. Several gene expression studies claim success of the clustering algorithm without showing a validation of complete clusterings, for example Ben-Dor and Yakhini (1999) and Törönen et al. (1999). In this dissertation we propose an evaluation approach based on a relative entropy measure that uses additional knowledge about genes (gene annotations) besides the gene expression data. More specifically, we use gene annotations in the form of an enzyme classification hierarchy, to evaluate clusterings. This classification is based on the main chemical reactions that are catalysed by enzymes. Furthermore, we evaluate clusterings with pure statistical measures of cluster validity (compactness and isolation). The experiments include applying two types of clustering methods (SOMs and hierarchical clustering) on a data set for which good annotation is available, so that the results can be partly validated from the viewpoint of biological relevance. The evaluation of the clusters indicates that clusters obtained from hierarchical average linkage clustering have much higher relative entropy values and lower compactness and isolation compared to SOM clusters. Clusters with high relative entropy often contain enzymes that are involved in the same enzymatic activity. On the other hand, the compactness and isolation measures do not seem to be reliable for evaluation of clustering results.
|
54 |
Bibliosémantique : une technique linguistique et informatique par exploration contextuelle / Bibliosémantic : a linguistic and computational method by contextual explorationBertin, Marc 21 January 2011 (has links)
Nous avons défini la bibliosémantique comme appartenant aux domaines de l'informatique et de la linguistique. Les objectifs sont sensiblement les mêmes que ceux prônés par la scientométrie, l'infométrie ou la bibliométrie, à savoir classifier, organiser et évaluer. Le cœur de notre implémentation repose sur l’utilisation des corpora annotés sémantiquement par la plateforme EXCOM. La mise en œuvre de la méthode de l'exploration contextuelle a conduit à une implémentation informatique de la bibliosémantique qui repose donc sur une sémantique du discours à défaut d'être une application purement métrique dans le contexte de cette étude menée autour des références bibliographiques. C’est la reconnaissance des références indexées ou abrégées, au sein de corpus d’articles scientifiques, qui permet d’identifier les segments textuels candidats pour l’annotation. La thèse présente également des marqueurs discursifs organisés sous la forme d’une carte sémantique, constituant les ressources linguistiques nécessaires et permettant l’automatisation de l'ensemble des traitements sémantiques. Afin de proposer une interface de navigation conviviale et adaptée à notre problématique, le système a été développé sous forme de service web. De nouveaux produits documentaires comme une notice bibliographique augmentée ont été mis en œuvre afin de faciliter l’exploitation des annotations par l’utilisateur. Enfin, nous proposons une évaluation du système et nous explicitons le protocole utilisé. Ce travail se termine par la présentation d’un certain nombre de recommandations, notamment la mise en place d’une cellule de veille. / We have defined Bibliosemantics as belonging to both fields of Computing and Linguistics. Its objectives are essentially the same as those advocated by the Scientometrics, Informetrics and Bibliometrics, i. e. classify, organize, evaluate. The core of our implementation is based on the use of semantically annotated corpora by EXCOM platform. The application of the Contextual Exploration method has led to a computer implementation of Bibliosemantics based on discourse semantics, as it is not a purely metric application in the context of this study about bibliographic references. The identification of indexed or abbreviated references in a corpus of scientific papers allows to establish the textual segments candidates for annotation. This thesis also presents the discourse markers, organised in a semantic map, which constitute the necessary linguistic resources making possible the automatic semantic processing. The system has been developed as a web service, with the aim to provide a navigation interface which is user-friendly and adapted to our problem. New documentary products such as a enriched bibliographic records have been implemented in order to facilitate the exploitation of annotations by the user. Finally, we propose an evaluation of the system and we explain the used protocol. This work culminates with the presentation of a number of recommendations such as setting up a monitoring unit.
|
55 |
Editor anotací pro WYSIWYG textové editory napsané v jazyce JavaScript / Annotations editor for WYSIWYG JavaScript-Based Text EditorsKleban, Martin January 2011 (has links)
This master's thesis deals with analysis of plugins development possibilities for WYSIWYG JavaScript-based text editors (TinyMCE, CKEditor, NicEdit, jWYSIWYG) and it describes design of annotations editor as a plugin for chosen editors. Structured annotations editor was implemented for TinyMCE and it includes own user interface implementation without usage of any complex universal libraries. Gained pieces of knowledge from WYSIWYG editors analysis and annotations editor implementation are to be found in the enclosure.
|
56 |
A Study of the Effectiveness of Annotations in Improving the Listening Comprehension of Intermediate ESL LearnersRocque, Ryan K. 19 April 2008 (has links) (PDF)
This study seeks to answer the age old question of what kind of input is best for ESL learners, but it approaches the question with a new perspective. There are many options when it comes to a choice of curriculum, both in terms of the method that is used and the materials that are available. Feature film is one important resource that has received increased attention in recent years. Curriculum specialists and teachers are incorporating various film clips into instruction to enhance a grammar point, to teach culture, or as a way to motivate learners. Yet adequate research does not yet exist that demonstrates how film can be used effectively. One possible solution to this problem that was explored in this study was the use of feature films in a self-study environment. Can using annotations of feature films, in this case definitions and pictures, improve a student's listening comprehension when students interact with them independent of a teacher? So few studies look at how annotations are used in this way. Overall, this study found that intermediate English for second language learners participating in this study did show significant gains in their test scores as compared with the control group, which did not view the film. In the present study, however, in comparing the three groups, the scores for students using annotations and not using annotations were not significantly different, perhaps the result of a small sample size. Nevertheless, this study does provide many insights into the current research and can provide important guidance for future research in this area of interest. Listening comprehension is a vital subject for research, and film is an excellent tool to enhance that research.
|
57 |
Addressing Semantic Interoperability and Text Annotations. Concerns in Electronic Health Records using Word Embedding, Ontology and AnalogyNaveed, Arjmand January 2021 (has links)
Electronic Health Record (EHR) creates a huge number of databases which are
being updated dynamically. Major goal of interoperability in healthcare is to
facilitate the seamless exchange of healthcare related data and an environment
to supports interoperability and secure transfer of data. The health care
organisations face difficulties in exchanging patient’s health care information
and laboratory reports etc. due to a lack of semantic interoperability. Hence,
there is a need of semantic web technologies for addressing healthcare
interoperability problems by enabling various healthcare standards from various
healthcare entities (doctors, clinics, hospitals etc.) to exchange data and its
semantics which can be understood by both machines and humans. Thus, a
framework with a similarity analyser has been proposed in the thesis that dealt
with semantic interoperability. While dealing with semantic interoperability,
another consideration was the use of word embedding and ontology for
knowledge discovery. In medical domain, the main challenge for medical
information extraction system is to find the required information by considering
explicit and implicit clinical context with high degree of precision and accuracy.
For semantic similarity of medical text at different levels (conceptual, sentence
and document level), different methods and techniques have been widely
presented, but I made sure that the semantic content of a text that is presented
includes the correct meaning of words and sentences. A comparative analysis
of approaches included ontology followed by word embedding or vice-versa
have been applied to explore the methodology to define which approach gives
better results for gaining higher semantic similarity. Selecting the Kidney Cancer
dataset as a use case, I concluded that both approaches work better in different circumstances. However, the approach in which ontology is followed by word
embedding to enrich data first has shown better results. Apart from enriching
the EHR, extracting relevant information is also challenging. To solve this
challenge, the concept of analogy has been applied to explain similarities
between two different contents as analogies play a significant role in
understanding new concepts. The concept of analogy helps healthcare
professionals to communicate with patients effectively and help them
understand their disease and treatment. So, I utilised analogies in this thesis to
support the extraction of relevant information from the medical text. Since
accessing EHR has been challenging, tweets text is used as an alternative for
EHR as social media has appeared as a relevant data source in recent years.
An algorithm has been proposed to analyse medical tweets based on analogous
words. The results have been used to validate the proposed methods. Two
experts from medical domain have given their views on the proposed methods
in comparison with the similar method named as SemDeep. The quantitative
and qualitative results have shown that the proposed analogy-based method
bring diversity and are helpful in analysing the specific disease or in text
classification.
|
58 |
Sémantická anotace textu / Semantic Annotation of TextDytrych, Jaroslav January 2017 (has links)
This thesis deals with intelligent systems for support of the semantic annotation of text. It discusses the motivation for creation of such systems and state of the art in the areas of their usage. The thesis also describes newly proposed and realised annotation system which realizes advanced functions of semantic filtering and presentation of annotation suggestion alternatives in a unique way. The results of finished experiments clearly show the advantages of proposed solution. They also prove that the user interface of the annotation tools affects the annotation process. The optimisation of displayed information for the task of disambiguation of ambiguous entity names was done and proposed methods to speedup and increase of quality of the created annotations was experimentally evaluated. The comparison with the Protégé general tool has proven the benefits of created system for collaborative ontology creation which should be anchored in the text. In the conclusion, all achieved results are analysed and summarized.
|
59 |
Excom‑2 : plateforme d’annotation automatique de catégories sémantiques : conception, modélisation et réalisation informatique : applications à la catégorisation des citations en arabe et en français / Excom-2 : a cross-language platform for automatic annotations according to semantic points of view : example of treatment : quotations categorization in Arabic and FrenshAlrahabi, Al Moatasem 29 January 2010 (has links)
Nous proposons une plateforme d’annotation sémantique, appelée « EXCOM-2 ». Basée sur la méthode de l’ « Exploration Contextuelle », elle permet, à travers une diversité de langues, de procéder à des annotations automatiques de segments textuels par l'analyse des formes de surface dans leur contexte. Les textes sont traités selon des « points de vue » discursifs dont les valeurs sont organisées dans une « carte sémantique ». L’annotation se base sur un ensemble de règles linguistiques, écrites par un analyste, qui permettent d’identifier les représentations textuelles sous-jacentes aux différentes catégories de la carte. Le système offre, à travers deux types d’interfaces (développeur ou utilisateur), une chaîne de traitements automatiques de textes qui comprend la segmentation, l’annotation et d’autres fonctionnalités de post-traitement. Les documents annotés peuvent être utilisés, par exemple, pour des systèmes de recherche d’information, de veille, de classification ou de résumé automatique. Comme exemple d'application, nous proposons un système d'identification et de catégorisation automatiques du discours rapporté en arabe et en français. / We propose a platform for semantic annotation, called “EXCOM-2”. Based on the “Contextual Exploration” method, it enables, across a great range of languages, to perform automatic annotations of textual segments by analyzing surface forms in their context. Texts are approached through discursive “points of view”, of which values are organized into a “semantic map”. The annotation is based on a set of linguistic rules, manually constructed by an analyst, and that enables to automatically identify the textual representations underlying the different semantic categories of the map. The system provides through two sorts of user-friendly interfaces (analyst or end-user) a complete pipeline of automatic text processing which consists of segmentation, annotation and other post-processing functionalities. Annotated documents can be used, for instance, for information retrieval systems, classification or automatic summarization. As example, we propose an analysis of the linguistic markers of the enunciative modalities in direct reported speech, in a multilingual framework concerning Arabic and French.
|
60 |
Théorie et pratique de la construction humaine supervisée du sensRouane, Khalid January 2004 (has links)
Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal.
|
Page generated in 0.0623 seconds