Global ETD Search

1	Semi-automatic Semantic Video Annotation Tool Aydinlilar, Merve 01 December 2011 (has links) (PDF) Semantic annotation of video content is necessary for indexing and retrieval tasks of video management systems. Currently, it is not possible to extract all high-level semantic information from video data automatically. Video annotation tools assist users to generate annotations to represent video data. Generated annotations can also be used for testing and evaluation of content based retrieval systems. In this study, a semi-automatic semantic video annotation tool is presented. Generated annotations are in MPEG-7 metadata format to ensure interoperability. With the help of image processing and pattern recognition solutions, annotation process is partly automated and annotation time is reduced. Annotations can be done for spatio-temporal decompositions of video data. Extraction of low-level visual descriptions are included to obtain complete descriptions.
2	Using formal logic to represent sign language phonetics in semi-automatic annotation tasks / Using formal logic to represent sign language phonetics in semi-automatic annotation tasks Curiel Diaz, Arturo Tlacaélel 23 November 2015 (has links) Cette thèse présente le développement d'un framework formel pour la représentation des Langues de Signes (LS), les langages des communautés Sourdes, dans le cadre de la construction d'un système de reconnaissance automatique. Les LS sont de langues naturelles, qui utilisent des gestes et l'espace autour du signeur pour transmettre de l'information. Cela veut dire que, à différence des langues vocales, les morphèmes en LS ne correspondent pas aux séquences de sons; ils correspondent aux séquences de postures corporelles très spécifiques, séparés par des changements tels que de mouvements. De plus, lors du discours les signeurs utilisent plusieurs parties de leurs corps (articulateurs) simultanément, ce qui est difficile à capturer avec un système de notation écrite. Cette situation difficulté leur représentation dans de taches de Traitement Automatique du Langage Naturel (TALN). Pour ces raisons, le travail présenté dans ce document a comme objectif la construction d'une représentation abstraite de la LS; plus précisément, le but est de pouvoir représenter des collections de vidéo LS (corpus) de manière formelle. En générale, il s'agit de construire une couche de représentation intermédiaire, permettant de faire de la reconnaissance automatique indépendamment des technologies de suivi et des corpus utilisés pour la recherche. Cette couche corresponde à un système de transition d'états (STE), spécialement crée pour représenter la nature parallèle des LS. En plus, elle peut-être annoté avec de formules logiques pour son analyse, à travers de la vérification de modèles. Pour représenter les propriétés à vérifier, une logique multi-modale a été choisi : la Logique Propositionnelle Dynamique (PDL). Cette logique a été originalement crée pour la spécification de programmes. De manière plus précise, PDL permit d'utilise des opérateurs modales comme [a] et <a>, représentant <<nécessité>> et <<possibilité>>, respectivement. Une variante particulaire a été développée pour les LS : la PDL pour Langue de Signes (PDLSL), qui est interprété sur des STE représentant des corpus. Avec PDLSL, chaque articulateur du corps (comme les mains et la tête) est vu comme un agent indépendant; cela veut dire que chacun a ses propres actions et propositions possibles, et qu'il peux les exécuter pour influencer une posture gestuelle. L'utilisation du framework proposé peut aider à diminuer deux problèmes importantes qui existent dans l'étude linguistique des LS : hétérogénéité des corpus et la manque des systèmes automatiques d'aide à l'annotation. De ce fait, un chercheur peut rendre exploitables des corpus existants en les transformant vers des STE. Finalement, la création de cet outil à permit l'implémentation d'un système d'annotation semi-automatique, basé sur les principes théoriques du formalisme. Globalement, le système reçoit des vidéos LS et les transforme dans un STE valide. Ensuite, un module fait de la vérification formelle sur le STE, en utilisant une base de données de formules crée par un expert en LS. Les formules représentent des propriétés lexicales à chercher dans le STE. Le produit de ce processus, est une annotation qui peut être corrigé par des utilisateurs humains, et qui est utilisable dans des domaines d'études tels que la linguistique. / This thesis presents a formal framework for the representation of Signed Languages (SLs), the languages of Deaf communities, in semi-automatic recognition tasks. SLs are complex visio-gestural communication systems; by using corporal gestures, signers achieve the same level of expressivity held by sound-based languages like English or French. However, unlike these, SL morphemes correspond to complex sequences of highly specific body postures, interleaved with postural changes: during signing, signers use several parts of their body simultaneously in order to combinatorially build phonemes. This situation, paired with an extensive use of the three-dimensional space, make them difficult to represent with tools already existent in Natural Language Processing (NLP) of vocal languages. For this reason, the current work presents the development of a formal representation framework, intended to transform SL video repositories (corpus) into an intermediate representation layer, where automatic recognition algorithms can work under better conditions. The main idea is that corpora can be described with a specialized Labeled Transition System (LTS), which can then be annotated with logic formulae for its study. A multi-modal logic was chosen as the basis of the formal language: the Propositional Dynamic Logic (PDL). This logic was originally created to specify and prove properties on computer programs. In particular, PDL uses the modal operators [a] and <a> to denote necessity and possibility, respectively. For SLs, a particular variant based on the original formalism was developed: the PDL for Sign Language (PDLSL). With the PDLSL, body articulators (like the hands or head) are interpreted as independent agents; each articulator has its own set of valid actions and propositions, and executes them without influence from the others. The simultaneous execution of different actions by several articulators yield distinct situations, which can be searched over an LTS with formulae, by using the semantic rules of the logic. Together, the use of PDLSL and the proposed specialized data structures could help curb some of the current problems in SL study; notably the heterogeneity of corpora and the lack of automatic annotation aids. On the same vein, this may not only increase the size of the available datasets, but even extend previous results to new corpora; the framework inserts an intermediate representation layer which can serve to model any corpus, regardless of its technical limitations. With this, annotations is possible by defining with formulae the characteristics to annotate. Afterwards, a formal verification algorithm may be able to find those features in corpora, as long as they are represented as consistent LTSs. Finally, the development of the formal framework led to the creation of a semi-automatic annotator based on the presented theoretical principles. Broadly, the system receives an untreated corpus video, converts it automatically into a valid LTS (by way of some predefined rules), and then verifies human-created PDLSL formulae over the LTS. The final product, is an automatically generated sub-lexical annotation, which can be later corrected by human annotators for their use in other areas such as linguistics. Langue des signes Logique propositionnelle dynamique Annotation automatique Sign Language Propositional Dynamic Logic Automatic Annotation Natural Language Processing
3	A framework for semantic web implementation based on context-oriented controlled automatic annotation. Hatem, Muna Salman January 2009 (has links) The Semantic Web is the vision of the future Web. Its aim is to enable machines to process Web documents in a way that makes it possible for the computer software to "understand" the meaning of the document contents. Each document on the Semantic Web is to be enriched with meta-data that express the semantics of its contents. Many infrastructures, technologies and standards have been developed and have proven their theoretical use for the Semantic Web, yet very few applications have been created. Most of the current Semantic Web applications were developed for research purposes. This project investigates the major factors restricting the wide spread of Semantic Web applications. We identify the two most important requirements for a successful implementation as the automatic production of the semantically annotated document, and the creation and maintenance of semantic based knowledge base. This research proposes a framework for Semantic Web implementation based on context-oriented controlled automatic Annotation; for short, we called the framework the Semantic Web Implementation Framework (SWIF) and the system that implements this framework the Semantic Web Implementation System (SWIS). The proposed architecture provides for a Semantic Web implementation of stand-alone websites that automatically annotates Web pages before being uploaded to the Intranet or Internet, and maintains persistent storage of Resource Description Framework (RDF) data for both the domain memory, denoted by Control Knowledge, and the meta-data of the Web site¿s pages. We believe that the presented implementation of the major parts of SWIS introduce a competitive system with current state of art Annotation tools and knowledge management systems; this is because it handles input documents in the ii context in which they are created in addition to the automatic learning and verification of knowledge using only the available computerized corporate databases. In this work, we introduce the concept of Control Knowledge (CK) that represents the application¿s domain memory and use it to verify the extracted knowledge. Learning is based on the number of occurrences of the same piece of information in different documents. We introduce the concept of Verifiability in the context of Annotation by comparing the extracted text¿s meaning with the information in the CK and the use of the proposed database table Verifiability_Tab. We use the linguistic concept Thematic Role in investigating and identifying the correct meaning of words in text documents, this helps correct relation extraction. The verb lexicon used contains the argument structure of each verb together with the thematic structure of the arguments. We also introduce a new method to chunk conjoined statements and identify the missing subject of the produced clauses. We use the semantic class of verbs that relates a list of verbs to a single property in the ontology, which helps in disambiguating the verb in the input text to enable better information extraction and Annotation. Consequently we propose the following definition for the annotated document or what is sometimes called the ¿Intelligent Document¿ ¿The Intelligent Document is the document that clearly expresses its syntax and semantics for human use and software automation¿. This work introduces a promising improvement to the quality of the automatically generated annotated document and the quality of the automatically extracted information in the knowledge base. Our approach in the area of using Semantic Web iii technology opens new opportunities for diverse areas of applications. E-Learning applications can be greatly improved and become more effective. Semantic web Meta-data Control Knowledge (CK) Knowledge management systems Intelligent Document Meaning
4	Semantic content analysis for effective video segmentation, summarisation and retrieval. Ren, Jinchang January 2009 (has links) This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications. / EU IST FP6 Project Semantic content analysis Shot boundary detection Video segmentation Subspace phase correlation Frame alignment Video summarisation Hierarchical modelling Adaptive clustering Content-based retrieval Automatic annotation TRECVID Digital video processing
5	A framework for semantic web implementation based on context-oriented controlled automatic annotation Hatem, Muna Salman January 2009 (has links) The Semantic Web is the vision of the future Web. Its aim is to enable machines to process Web documents in a way that makes it possible for the computer software to "understand" the meaning of the document contents. Each document on the Semantic Web is to be enriched with meta-data that express the semantics of its contents. Many infrastructures, technologies and standards have been developed and have proven their theoretical use for the Semantic Web, yet very few applications have been created. Most of the current Semantic Web applications were developed for research purposes. This project investigates the major factors restricting the wide spread of Semantic Web applications. We identify the two most important requirements for a successful implementation as the automatic production of the semantically annotated document, and the creation and maintenance of semantic based knowledge base. This research proposes a framework for Semantic Web implementation based on context-oriented controlled automatic Annotation; for short, we called the framework the Semantic Web Implementation Framework (SWIF) and the system that implements this framework the Semantic Web Implementation System (SWIS). The proposed architecture provides for a Semantic Web implementation of stand-alone websites that automatically annotates Web pages before being uploaded to the Intranet or Internet, and maintains persistent storage of Resource Description Framework (RDF) data for both the domain memory, denoted by Control Knowledge, and the meta-data of the Web site's pages. We believe that the presented implementation of the major parts of SWIS introduce a competitive system with current state of art Annotation tools and knowledge management systems; this is because it handles input documents in the ii context in which they are created in addition to the automatic learning and verification of knowledge using only the available computerized corporate databases. In this work, we introduce the concept of Control Knowledge (CK) that represents the application's domain memory and use it to verify the extracted knowledge. Learning is based on the number of occurrences of the same piece of information in different documents. We introduce the concept of Verifiability in the context of Annotation by comparing the extracted text's meaning with the information in the CK and the use of the proposed database table Verifiability_Tab. We use the linguistic concept Thematic Role in investigating and identifying the correct meaning of words in text documents, this helps correct relation extraction. The verb lexicon used contains the argument structure of each verb together with the thematic structure of the arguments. We also introduce a new method to chunk conjoined statements and identify the missing subject of the produced clauses. We use the semantic class of verbs that relates a list of verbs to a single property in the ontology, which helps in disambiguating the verb in the input text to enable better information extraction and Annotation. Consequently we propose the following definition for the annotated document or what is sometimes called the 'Intelligent Document' 'The Intelligent Document is the document that clearly expresses its syntax and semantics for human use and software automation'. This work introduces a promising improvement to the quality of the automatically generated annotated document and the quality of the automatically extracted information in the knowledge base. Our approach in the area of using Semantic Web iii technology opens new opportunities for diverse areas of applications. E-Learning applications can be greatly improved and become more effective. 020
6	Semantic content analysis for effective video segmentation, summarisation and retrieval Ren, Jinchang January 2009 (has links) This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications. 502.85
7	Apprentissage de représentations musicales à l'aide d'architectures profondes et multiéchelles Hamel, Philippe 05 1900 (has links) L'apprentissage machine (AM) est un outil important dans le domaine de la recherche d'information musicale (Music Information Retrieval ou MIR). De nombreuses tâches de MIR peuvent être résolues en entraînant un classifieur sur un ensemble de caractéristiques. Pour les tâches de MIR se basant sur l'audio musical, il est possible d'extraire de l'audio les caractéristiques pertinentes à l'aide de méthodes traitement de signal. Toutefois, certains aspects musicaux sont difficiles à extraire à l'aide de simples heuristiques. Afin d'obtenir des caractéristiques plus riches, il est possible d'utiliser l'AM pour apprendre une représentation musicale à partir de l'audio. Ces caractéristiques apprises permettent souvent d'améliorer la performance sur une tâche de MIR donnée. Afin d'apprendre des représentations musicales intéressantes, il est important de considérer les aspects particuliers à l'audio musical dans la conception des modèles d'apprentissage. Vu la structure temporelle et spectrale de l'audio musical, les représentations profondes et multiéchelles sont particulièrement bien conçues pour représenter la musique. Cette thèse porte sur l'apprentissage de représentations de l'audio musical. Des modèles profonds et multiéchelles améliorant l'état de l'art pour des tâches telles que la reconnaissance d'instrument, la reconnaissance de genre et l'étiquetage automatique y sont présentés. / Machine learning (ML) is an important tool in the field of music information retrieval (MIR). Many MIR tasks can be solved by training a classifier over a set of features. For MIR tasks based on music audio, it is possible to extract features from the audio with signal processing techniques. However, some musical aspects are hard to extract with simple heuristics. To obtain richer features, we can use ML to learn a representation from the audio. These learned features can often improve performance for a given MIR task. In order to learn interesting musical representations, it is important to consider the particular aspects of music audio when building learning models. Given the temporal and spectral structure of music audio, deep and multi-scale representations are particularly well suited to represent music. This thesis focuses on learning representations from music audio. Deep and multi-scale models that improve the state-of-the-art for tasks such as instrument recognition, genre recognition and automatic annotation are presented. Apprentissage machine Machine learning Recherche d'information musicale Music information retrieval Analyse d'audio musical Music audio analysis Étiquetage automatique Automatic annotation Apprentissage profond Deep learning Apprentissage multiéchelle Multiscale learning
8	Méthode automatique d’annotations sémantiques et indexation de documents textuels pour l’extraction d’objets pédagogiques / Automatic method of semantic annotation and indexing of textual documents to extract learning objects Ben Ali, Boutheina 18 January 2014 (has links) L'analyse du contenu devient une nécessité pour l'accès et l'utilisation de l'information en particulier dans le domaine de la didactique des disciplines. Nous proposons un système SRIDOP d'annotations sémantiques et d'indexation des documents pédagogiques à partir des annotations, en se basant sur la méthode d'Exploration Contextuelle qui, à un identificateur linguistique d'un concept, associe une annotation d'un segment en tenant compte d'indices contextuels gérés par des règles. SRIDOP est composé de quatre modules consécutifs : (1)Segmentation automatique des documents en paragraphes et phrases ; (2) annotation selon différents points de vue de fouille (exemple: identification de définitions, exemples, exercices, etc.) en se basant sur une ontologie linguistique de concepts associés à un point de vue de fouille (carte sémantique) et de ressources linguistiques (indicateurs de concepts, indices linguistiques et règles d'Exploration Contextuelle) ; (3) extraction d'objets pédagogiques ; (4) constitution de fiches pédagogiques exploitables par les utilisateurs. SRIDOP est évalué et comparé à d'autres systèmes. / Content analysis is a need for access and use of information especially in the field of didactics. We propose a system SRIDOP of semantic annotations and indexing of learning objects from these annotations, based on the Contextual Exploration method, that associate annotation of a segment to a linguistic identifier of a concept, taking into account contextual clues managed by rules. SRIDOP is composed of four consecutive modules: (1) Automatic segmentation of documents into paragraphs and sentences; (2) annotation from different points of view of search (eg identification of definitions, examples, exercises, etc..) based on a linguistic ontology of concepts associated with a point of view of search (semantic map) and linguistic resources (indicators of concepts, linguistic clues and contextual exploration rules); (3) extraction of learning objects, (4) establishment of learning sheets exploitable by users. SRIDOP is evaluated and compared to other systems. Objet pédagogique Annotation automatique Indexation par annotation Exploration contextuelle Fiche pédagogique Évaluation d'un système d'annotation Learning objects Automatic annotation Indexation based on annotation Contextual exploration Learning sheets Annotation system evaluation 004
9	Apprentissage de représentations musicales à l'aide d'architectures profondes et multiéchelles Hamel, Philippe 05 1900 (has links) L'apprentissage machine (AM) est un outil important dans le domaine de la recherche d'information musicale (Music Information Retrieval ou MIR). De nombreuses tâches de MIR peuvent être résolues en entraînant un classifieur sur un ensemble de caractéristiques. Pour les tâches de MIR se basant sur l'audio musical, il est possible d'extraire de l'audio les caractéristiques pertinentes à l'aide de méthodes traitement de signal. Toutefois, certains aspects musicaux sont difficiles à extraire à l'aide de simples heuristiques. Afin d'obtenir des caractéristiques plus riches, il est possible d'utiliser l'AM pour apprendre une représentation musicale à partir de l'audio. Ces caractéristiques apprises permettent souvent d'améliorer la performance sur une tâche de MIR donnée. Afin d'apprendre des représentations musicales intéressantes, il est important de considérer les aspects particuliers à l'audio musical dans la conception des modèles d'apprentissage. Vu la structure temporelle et spectrale de l'audio musical, les représentations profondes et multiéchelles sont particulièrement bien conçues pour représenter la musique. Cette thèse porte sur l'apprentissage de représentations de l'audio musical. Des modèles profonds et multiéchelles améliorant l'état de l'art pour des tâches telles que la reconnaissance d'instrument, la reconnaissance de genre et l'étiquetage automatique y sont présentés. / Machine learning (ML) is an important tool in the field of music information retrieval (MIR). Many MIR tasks can be solved by training a classifier over a set of features. For MIR tasks based on music audio, it is possible to extract features from the audio with signal processing techniques. However, some musical aspects are hard to extract with simple heuristics. To obtain richer features, we can use ML to learn a representation from the audio. These learned features can often improve performance for a given MIR task. In order to learn interesting musical representations, it is important to consider the particular aspects of music audio when building learning models. Given the temporal and spectral structure of music audio, deep and multi-scale representations are particularly well suited to represent music. This thesis focuses on learning representations from music audio. Deep and multi-scale models that improve the state-of-the-art for tasks such as instrument recognition, genre recognition and automatic annotation are presented. Apprentissage machine Machine learning Recherche d'information musicale Music information retrieval Analyse d'audio musical Music audio analysis Étiquetage automatique Automatic annotation Apprentissage profond Deep learning Apprentissage multiéchelle Multiscale learning
10	支援數位人文研究之文本自動標註系統發展與使用評估研究 / Development and evaluation of an automatic text annotation system for supporting digital humanities research 劉鎮宇, Liu, Chen Yu Unknown Date (has links) 在傳統的人文研究中，人文學者大多以如古籍珍善本、歷史文獻等紙本出版形式之文本為主要研究文本型式，但是隨著資訊社會的來臨，許多研究機構陸續將這些紙本資料進行數位化並建置數位典藏資料庫，對人文研究環境與知識取得管道帶來巨大的改變，基於數位閱讀之文本研究型式也成為必然的發展趨勢。因此，本研究發展支援數位人文研究之「文本自動標註系統」，藉由Linked Data的概念匯集來自不同資料庫的資源，並加以整合後，替文本進行自動註解，讓使用者在解讀文本時能夠即時參照其他資料庫的資源，並提供友善的具文本標註之閱讀介面，以利於人文學者透過閱讀進行資料的解讀。本研究以實驗研究法比較本研究所發展之「文本自動標註系統」與「MARKUS文本半自動標註系統」在支援人文學者進行文本資料解讀之閱讀成效與科技接受度是否具有顯著差異，並輔以半結構式深度訪談了解人文學者對於本研究發展之「文本自動標註系統」的看法及感受，也進一步分析「文本自動標註系統」閱讀成效、科技接受度及使用者行為歷程之間是否具有關聯性。實驗結果發現，採用本研究發展之文本自動標註系統的閱讀成效高於MARKUS文本半自動標註系統，但未達顯著差異；而科技接受度分析結果則顯示文本自動標註系統之科技接受度顯著優於MARKUS文本半自動標註系統。另外，從訪談結果歸納得知，文本自動標註系統閱讀介面簡潔明瞭，比MARKUS文本半自動標註系統更適合閱讀，而閱讀介面是否易於使用與是否有用，是影響人文學者能否接受採用系統輔助數位人文研究的重要因素。此外，在兩個系統類似功能比較分析後也發現，文本自動標註系統在查詢詞彙功能、連結到來源網站功能及新增標註功能都比MARKUS文本半自動標註系統更為直覺易用。另外人文學者普遍認為斷句功能比自動斷詞功能更重要，鏈結來源資料庫則以萌典最有幫助。最後，採用文本自動標註系統之閱讀成效與使用者行為歷程之間無顯著關聯性。 / In traditional humanities research, most humanities scholars studied text-type paper-based publishing texts, such as rare ancient books and historical literature. However, many research institutes, in the information society, gradually digitalized such paper-based data and established digital archives database to result in great changes in humanities research environment and knowledge acquisition channels. The research pattern with digital reading based texts became the essential development trend. For this reason, an “automatic text annotation system” for supporting digital humanities research is developed in this study. Resources from distinct database are gathered through Linked Data and integrated for the automatic annotation of texts. It allows users immediately referring to resources from other database when interpreting texts and provides friendly reading interface with text annotation for humanities scholars interpreting data through reading. With experimental research, the “automatic text annotation system” developed in this study is compared with “MARKUS semi-automatic text annotation system” for supporting humanities scholars interpreting text data to discussed the difference in reading achievement and technology acceptance. Semi-structured in-depth interviews are also proceeded to understand humanities scholars’ opinions and perception about the “automatic text annotation system” developed in this study as well as to analyze the correlations among reading achievement, technology acceptance, and user behavior course of the “automatic text annotation system”. The experimental findings show that the reading achievement with the automatic text annotation system developed in this study is higher than that with MARKUS semi-automatic text annotation system, but not achieving the significance. The technology acceptance analysis reveals remarkably better technology acceptance of the automatic text annotation system than MARKUS semi-automatic text annotation system. According to the interviews, the reading interface of the automatic text annotation system is simple and clear that it is more suitable for reading than MARKUS semi-automatic text annotation system. The ease of use and usefulness of reading interface is a key factor in humanities scholars accepting the system for the digital humanities research. In regard to the comparison of similar functions between two systems, the functions of vocabulary enquiry, linking to source web sites, and annotation appending of the automatic text annotation system are more intuitive and easy to use than those of MARKUS semi-automatic text annotation system. What is more, humanities scholars emphasize more on the sentence segmentation function than the automatic word segmentation function, and the linked source database, Moedict, appears the best assistance. Finally, there is no significant correlation between reading achievement and user behavior course with the automatic text annotation system. 數位人文自動標註系統中文自動斷詞鏈結資料 Digital humanities Automatic annotation system Automatic Chinese word segmentation Linked data

Search results