Global ETD Search

21	Contribution to complex visual information processing and autonomous knowledge extraction : application to autonomous robotics Ramik, Dominik Maximilián 10 December 2012 (has links) (PDF) The work accomplished in this thesis concerns development of an autonomous machine cognition system. The proposed solution reposes on the assumption that it is the curiosity which motivates a cognitive system to acquire new knowledge. Further, two distinct kinds of curiosity are identified in conformity to human cognitive system. On this I build a two level cognitive architecture. I identify its lower level with the perceptual saliency mechanism, while the higher level performs knowledge acquisition from observation and interaction with the environment. This thesis brings the following contribution: A) Investigation of the state of the art in autonomous knowledge acquisition. B) Realization of a lower cognitive level in the ensemble of the mentioned system, which is realizing the perceptual curiosity mechanism through a novel fast, real-world robust algorithm for salient object detection and learning. C) Realization of a higher cognitive level through a general framework for knowledge acquisition from observation and interaction with the environment including humans. Based on the epistemic curiosity, the high-level cognitive system enables a machine (e.g. a robot) to be itself the actor of its learning. An important consequence of this system is the possibility to confer high level multimodal cognitive capabilities to robots to increase their autonomy in real-world environment (human environment). D) Realization of the strategy proposed in the context of autonomous robotics. The studies and experimental validations done had confirmed notably that our approach allows increasing the autonomy of robots in real-world environment [INFO:INFO_OH] Computer Science/Other Artificial cognitive system Visual information Knowledge extraction Cognitive system Machine-Learning Autonomous Robotics
22	Εξαγωγή γνώσης από αποθήκες υπηρεσιών Παγκόσμιου Ιστού / Knowledge extraction from Web services repositories Κιούφτης, Βασίλειος 16 May 2014 (has links) Με την αυξανόμενη χρήση του Παγκόσμιου Ιστού και των Συστημάτων Προσανατολισμένων στις Υπηρεσίες , οι υπηρεσίες παγκόσμιου ιστού έχουν γίνει μίας ευρέως διαδεδομένη ως προς τη χρήση τεχνολογία. Οι αποθήκες υπηρεσιών παγκόσμιου ιστού αναπτύσσονται με ραγδαίους ρυθμούς , δημιουργώντας την ανάγκη ανάπτυξης προηγμένων εργαλείων για την οργάνωση και δεικτοδότησή τους. Η ομαδοποίηση των υπηρεσιών παγκόσμιου ιστού, οι οποίες συνήθως αναπαρίστανται από έγγραφα Γλώσσας Περιγραφής Υπηρεσιών Παγκόσμιου Ιστού (Web Service Description Language - WSDL) , καθιστά τις μηχανές αναζήτησης υπηρεσιών παγκόσμιου ιστού αλλά και τους χρήστες ικανούς να οργανώνουν και να επεξεργάζονται μεγάλες αποθήκες υπηρεσιών σε ομάδες με παρόμοια λειτουργικότητα και χαρακτηριστικά. Σε αυτή την εργασία προτείνουμε μια νέα τεχνική για την ομαδοποίηση των WSDL εγγράφων. Η προτεινόμενη μέθοδος θεωρεί τις υπηρεσίες παγκόσμιου ιστού ως κατηγορικά δεδομένα όπου κάθε υπηρεσία περιγράφεται από ένα σύνολο τιμών που εξάγονται από το περιεχόμενο και τη δομή του αντίστοιχου αρχείου περιγραφής και ως μέτρο ποιότητας της ομαδοποίησης ορίζεται η αμοιβαία πληροφορία μεταξύ των ομάδων και των τιμών τους. Περιγράφουμε τον τρόπο με τον οποίο οι υπηρεσίες παγκόσμιου ιστού αναπαρίστανται ως κατηγορικά δεδομένα και ομαδοποιούνται, χρησιμοποιώντας τον αλγόριθμο ομαδοποίησης κατηγορικών δεδομένων LIMBO , ελαχιστοποιώντας συγχρόνως την απώλεια πληροφορίας στις τιμές που εξάγονται από τα γνωρίσματα. Κατά την πειραματική αξιολόγηση , η δική μας προσέγγιση υπερέχει σε απόδοση F-Measure τις τεχνικές που χρησιμοποιούν εναλλακτικές μετρικές ομοιότητας και μεθόδους για την ομαδοποίηση WSDL εγγράφων. / With the increasing use of web and Service Oriented Systems, web-services have become a widely adopted technology. Web services repositories are growing fast, creating the need for advanced tools for organizing and indexing them. Clustering web services, usually represented by Web Service Description Language (WSDL) documents, enables the web service search engines and users to organize and process large web service repositories in groups with similar functionality and characteristics. In this paper, we propose a novel technique of clustering WSDL documents. The proposed method considers web services as categorical data and each service is described by a set of values extracted from the content and structure of its description file and as quality measure of clustering is defined the mutual information of the clusters and their values. We describe the way to represent web services as categorical data and how to cluster them by using LIMBO algorithm, minimizing at the same time the information loss in features values. In experimental evaluation, our approach outperforms in terms of F-Measure the approaches which use alternative similarity measures and methods for clustering WSDL documents. Εξαγωγή γνώσης 006.312 Web services repositories Clustering WSDL documents Knowledge extraction
23	Solving Winograd Schema Challenge : Using Semantic Parsing, Automatic Knowledge Acquisition and Logical Reasoning January 2014 (has links) abstract: Turing test has been a benchmark scale for measuring the human level intelligence in computers since it was proposed by Alan Turing in 1950. However, for last 60 years, the applications such as ELIZA, PARRY, Cleverbot and Eugene Goostman, that claimed to pass the test. These applications are either based on tricks to fool humans on a textual chat based test or there has been a disagreement between AI communities on them passing the test. This has led to the school of thought that it might not be the ideal test for predicting the human level intelligence in machines. Consequently, the Winograd Schema Challenge has been suggested as an alternative to the Turing test. As opposed to deciding the intelligent behavior with the help of chat servers, like it was done in the Turing test, the Winograd Schema Challenge is a question answering test. It consists of sentence and question pairs such that the answer to the question depends on the resolution of a definite pronoun or adjective in the sentence. The answers are fairly intuitive for humans but they are difficult for machines because it requires some sort of background or commonsense knowledge about the sentence. In this thesis, I propose a novel technique to solve the Winograd Schema Challenge. The technique has three basic modules at its disposal, namely, a Semantic Parser that parses the English text (both sentences and questions) into a formal representation, an Automatic Background Knowledge Extractor that extracts the Background Knowledge pertaining to the given Winograd sentence, and an Answer Set Programming Reasoning Engine that reasons on the given Winograd sentence and the corresponding Background Knowledge. The applicability of the technique is illustrated by solving a subset of Winograd Schema Challenge pertaining to a certain type of Background Knowledge. The technique is evaluated on the subset and a notable accuracy is achieved. / Dissertation/Thesis / Masters thesis defense presentation slides / Masters Thesis Computer Science 2014 Computer science Artificial intelligence Commonsense Reasoning Natural Language Prcessing Semantic Parsing Winograd Schema Challenge
24	Expansão de ontologia através de leitura de máquina contínua Barchi, Paulo Henrique 31 March 2015 (has links) Submitted by Bruna Rodrigues (bruna92rodrigues@yahoo.com.br) on 2016-09-26T12:11:20Z No. of bitstreams: 1 DissPHB.pdf: 1422339 bytes, checksum: 7c3b7208c3184e1c18f391a6f6171b04 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-26T18:41:54Z (GMT) No. of bitstreams: 1 DissPHB.pdf: 1422339 bytes, checksum: 7c3b7208c3184e1c18f391a6f6171b04 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-26T18:42:01Z (GMT) No. of bitstreams: 1 DissPHB.pdf: 1422339 bytes, checksum: 7c3b7208c3184e1c18f391a6f6171b04 (MD5) / Made available in DSpace on 2016-09-26T18:42:09Z (GMT). No. of bitstreams: 1 DissPHB.pdf: 1422339 bytes, checksum: 7c3b7208c3184e1c18f391a6f6171b04 (MD5) Previous issue date: 2015-03-31 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / NELL (Never Ending Language Learning system) (CARLSON et al., 2010) is the first system to practice the Never-Ending Machine Learning paradigm techniques. It has an inactive component to continually extend its KB: OntExt (MOHAMED; Hruschka Jr.; MITCHELL, 2011). Its main idea is to identify and add to the KB new relations which are frequently asserted in huge text data. Co-occurrence matrices are used to structure the normalized values of cooccurrence between the contexts for each category pair to identify those context patterns. The clustering of each matrix is done with Weka K-means algorithm (HALL et al., 2009): from each cluster, a new possible relation. This work present newOntExt: a new approach with new features to turn the ontology extension task feasible to NELL. This approach has also an alternative task of naming new relations found by another NELL component: Prophet. The relations are classified as valid or invalid by humans; the precision is calculated for each experiment and the results are compared to those relative to OntExt. Initial results show that ontology extension with newOntExt can help Never-Ending Learning systems to expand its volume of beliefs and to keep learning with high precision by acting in auto-supervision and auto-reflection. / NELL (Never Ending Language Learning system) (CARLSON et al., 2010) é o primeiro sistema a praticar as técnicas do paradigma de Aprendizado Sem-Fim (ASF). Ele possui um subsistema componente inativo para continuamente expandir a Base de Conhecimento (BC): OntExt, que tem como ideia principal identificar e adicionar à BC novas relações que são frequentemente afirmadas em grandes bases de texto. Para isso, matrizes de coocorrência são utilizadas para estruturar os valores normalizados de co-ocorrência entre as frases verbais para cada par de categorias a fim de identificar padrões de contexto que interligam estas categorias. O agrupamento de cada uma destas matrizes é feito com o algoritmo K-médias do Weka: uma possível relação nova a partir de cada agrupamento. Este trabalho apresenta newOntExt: uma abordagem atualizada com novos recursos para tornar a extensão de ontologia uma tarefa mais palpável. Além desta metodologia tradicional, newOntExt pode validar e nomear relações encontradas pelo Prophet, outro subsistema componente do NELL. As relações geradas são classificadas por humanos como válidas ou inválidas; para cada experimento é calculada a precisão e os resultados são comparados aos de OntExt. Resultados iniciais mostram que a extensão de ontologia com newOntExt pode ajudar sistemas de ASF a expandir o volume de crenças e manter alta precisão ao atuar na auto-supervisão e auto-reflexão. Extração de conhecimento Descoberta de conhecimento Extensão de ontologia Autosupervisão Knowledge extraction Knowledge discovery Ontology extension Auto-supervision
25	Simulation numérique et approche orientée connaissance pour la découverte de nouvelles molécules thérapeutiques / Numeric simulation and knowledge-oriented approach for the discovery of new therapeutic molecules Ghemtio Wafo, Léo Aymar 07 May 2010 (has links) L’innovation thérapeutique progresse traditionnellement par la combinaison du criblage expérimental et de la modélisation moléculaire. En pratique, cette dernière approche est souvent limitée par la pénurie de données expérimentales, particulièrement les informations structurales et biologiques. Aujourd'hui, la situation a complètement changé avec le séquençage à haut débit du génome humain et les avancées réalisées dans la détermination des structures tridimensionnelles des protéines. Cette détermination permet d’avoir accès à une grande quantité de données pouvant servir à la recherche de nouveaux traitements pour un grand nombre de maladies. À cet égard, les approches informatiques permettant de développer des programmes de criblage virtuel à haut débit offrent une alternative ou un complément aux méthodes expérimentales qui font gagner du temps et de l’argent dans la découverte de nouveaux traitements.Cependant, la plupart de ces approches souffrent des mêmes limitations. Le coût et la durée des temps de calcul pour évaluer la fixation d'une collection de molécules à une cible, qui est considérable dans le contexte du haut débit, ainsi que la précision des résultats obtenus sont les défis les plus évidents dans le domaine. Le besoin de gérer une grande quantité de données hétérogènes est aussi particulièrement crucial.Pour surmonter les limitations actuelles du criblage virtuel à haut débit et ainsi optimiser les premières étapes du processus de découverte de nouveaux médicaments, j’ai mis en place une méthodologie innovante permettant, d’une part, de gérer une masse importante de données hétérogènes et d’en extraire des connaissances et, d’autre part, de distribuer les calculs nécessaires sur les grilles de calcul comportant plusieurs milliers de processeurs, le tout intégré à un protocole de criblage virtuel en plusieurs étapes. L’objectif est la prise en compte, sous forme de contraintes, des connaissances sur le problème posé afin d’optimiser la précision des résultats et les coûts en termes de temps et d’argent du criblage virtuel / Therapeutic innovation has traditionally benefited from the combination of experimental screening and molecular modelling. In practice, however, the latter is often limited by the shortage of structural and biological information. Today, the situation has completely changed with the high-throughput sequencing of the human genome, and the advances realized in the three-dimensional determination of the structures of proteins. This gives access to an enormous amount of data which can be used to search for new treatments for a large number of diseases. In this respect, computational approaches have been used for high-throughput virtual screening (HTVS) and offer an alternative or a complement to the experimental methods, which allow more time for the discovery of new treatments.However, most of these approaches suffer the same limitations. One of these is the cost and the computing time required for estimating the binding of all the molecules from a large data bank to a target, which can be considerable in the context of the high-throughput. Also, the accuracy of the results obtained is another very evident challenge in the domain. The need to manage a large amount of heterogeneous data is also particularly crucial.To try to surmount the current limitations of HTVS and to optimize the first stages of the drug discovery process, I set up an innovative methodology presenting two advantages. Firstly, it allows to manage an important mass of heterogeneous data and to extract knowledge from it. Secondly, it allows distributing the necessary calculations on a grid computing platform that contains several thousand of processors. The whole methodology is integrated into a multiple-step virtual screening funnel. The purpose is the consideration, in the form of constraints, of the knowledge available about the problem posed in order to optimize the accuracy of the results and the costs in terms of time and money at various stages of high-throughput virtual screening Criblage virtuel à haut débit Base de données Grille de calculs Extraction de connaissances Virtual high throughput screening Database Grid computing Knowledge extraction
26	Knowledge Acquisition in a System Thomas, Christopher J. January 2012 (has links) No description available. Artificial Intelligence Computer Science Information Science Knowledge Extraction Knowledge Acquisition Knowledge Representation Information Extraction Ontology Engineering Community-generated Content
27	The Rule Extraction from Multi-layer Feed-forward Neural Networks 柯文乾, Ke, Wen-Chyan Unknown Date (has links) 神經網路已經被成功地應用於解決各種分類及函數近似的問題，尤其因為神經網路是個萬能的近似器(universal approximator)，所以對於函數近似的問題效果更為顯著。以往對於此類問題雖然多數以線性的分析工具為主，但是實際上多數問題本質上是非線性的，所以對於非線性分析工具的需求其實是很大的。自1986年起，神經網路本身的運作一直被視為一個黑箱作業，難以判斷網路學習結果的合理性，更無法有效地幫助使用者增進其知識，因此提供一套合理及有效的神經網路分析方法是重要。本文提出一套分析神網路系統的方法；利用線性規劃的技巧萃取及分析網路中的規則(rule)，而不需要對任何資料集做分析；進而利用統計無母數方法－符號檢定－歸納出網路中的知識。以債券評價為例，驗證此方法的可行性，實證結果亦顯示此方法所萃取出來的規則是合理的，且由這些萃取出的規則中，所歸納出來有關債券評價的知識多數是合理的。 / Neural networks have been successfully applied to solve a variety of application problems including classification and function approximation. They are especially useful for function approximation problems because they have been shown to be uni-versal approximators. In the past, for function approximation problems, they were mainly analyzed via tools of linear analyses. However, most of the function approxi-mation problems needed tools of nonlinear analyses in fact. Thus, there is the much demand for tools of nonlinear analyses. Since 1986, the neural network is considered a black box. It is hard to determine if the learning result of a neural network is rea-sonable, and the network can not effectively help users to develop the domain knowl-edge. Thus, it is important to supply a reasonable and effective analytic method of the neural network. Here, we propose an analytic method of the neural network. It can extract rules from the neural network and analyze them via the Linear Programming and does not depend on any data analysis. Then we can generalize domain knowledge from these rules via the sign test, a statistical non-parameter method. We take the bond-pricing as an instance to examine the feasibility of our proposed method. The result shows that these extracted rules are reasonable by our method and that these generalized domain knowledge from these rules is also reasonable. 知識萃取規則萃取法則萃取債券評價 knowledge extraction rule extraction bond-pricing
28	Extraction en langue chinoise d'actions spatiotemporalisées réalisées par des personnes ou des organismes / Extraction of spatiotemporally located actions performed by individuals or organizations from Chinese texts Wang, Zhen 09 June 2016 (has links) La thèse a deux objectifs : le premier est de développer un analyseur qui permet d'analyser automatiquement des sources textuelles en chinois simplifié afin de segmenter les textes en mots et de les étiqueter par catégories grammaticales, ainsi que de construire les relations syntaxiques entre les mots. Le deuxième est d'extraire des informations autour des entités et des actions qui nous intéressent à partir des textes analysés. Afin d'atteindre ces deux objectifs, nous avons traité principalement les problématiques suivantes : les ambiguïtés de segmentation, la catégorisation ; le traitement des mots inconnus dans les textes chinois ; l'ambiguïté de l'analyse syntaxique ; la reconnaissance et le typage des entités nommées. Le texte d'entrée est traité phrase par phrase. L'analyseur commence par un traitement typographique au sein des phrases afin d'identifier les écritures latines et les chiffres. Ensuite, nous segmentons la phrase en mots à l'aide de dictionnaires. Grâce aux règles linguistiques, nous créons des hypothèses de noms propres, changeons les poids des catégories ou des mots selon leur contextes gauches ou/et droits. Un modèle de langue n-gramme élaboré à partir d'un corpus d'apprentissage permet de sélectionner le meilleur résultat de segmentation et de catégorisation. Une analyse en dépendance est utilisée pour marquer les relations entre les mots. Nous effectuons une première identification d'entités nommées à la fin de l'analyse syntaxique. Ceci permet d'identifier les entités nommées en unité ou en groupe nominal et également de leur attribuer un type. Ces entités nommées sont ensuite utilisées dans l'extraction. Les règles d'extraction permettent de valider ou de changer les types des entités nommées. L'extraction des connaissances est composée des deux étapes : extraire et annoter automatiquement des contenus à partir des textes analysés ; vérifier les contenus extraits et résoudre la cohérence à travers une ontologie. / We have developed an automatic analyser and an extraction module for Chinese langage processing. The analyser performs automatic Chinese word segmentation based on linguistic rules and dictionaries, part-of-speech tagging based on n-gram statistics and dependency grammar parsing. The module allows to extract information around named entities and activities. In order to achieve these goals, we have tackled the following main issues: segmentation and part-of-speech ambiguity; unknown word identification in Chinese text; attachment ambiguity in parsing. Chinese texts are analysed sentence by sentence. Given a sentence, the analyzer begins with typographic processing to identify sequences of Latin characters and numbers. Then, dictionaries are used for preliminary segmentation into words. Linguistic-based rules are used to create proper noun hypotheses and change the weight of some word categories. These rules take into account word context. An n-gram language model is created from a training corpus and selects the best word segmentation and parts-of-speech. Dependency grammar parsing is used to annotate relations between words. A first step of named entity recognition is performed after parsing. Its goal is to identify single-word named entities and noun-phrase-based named entities and to determine their semantic type. These named entities are then used in knowledge extraction. Knowledge extraction rules are used to validate named entities or to change their types. Knowledge extraction consists of two steps: automatic content extraction and tagging from analysed text; extracted contents control and ontology-based co-reference resolution. Langue chinoise Traitement automatique du chinois Extraction d'information Segmentation Analyse syntaxique Reconnaissance d'entités nommées Chinese Chinese language processing Knowledge extraction Segmentation Parsing Named entity recognition
29	Partager le savoir du lexicographe: extraction et modélisation ontologique des savoirs lexicographiques Comeau, Sophie 12 1900 (has links) Cette recherche porte sur la lexicologie, la lexicographie et l’enseignement/apprentissage du lexique. Elle s’inscrit dans le cadre du projet Modélisation ontologique des savoirs lexicographiques en vue de leur application en linguistique appliquée, surnommé Lexitation, qui est, à notre connaissance, la première tentative d’extraction des savoirs lexicographiques — i.e. connaissances déclaratives et procédurales utilisées par des lexicographes — utilisant une méthode expérimentale. Le projet repose sur le constat que les savoirs lexicographiques ont un rôle crucial à jouer en lexicologie, mais aussi en enseignement/apprentissage du lexique. Dans ce mémoire, nous décrirons les méthodes et les résultats de nos premières expérimentations, effectuées à l’aide du Think Aloud Protocol (Ericsson et Simon, 1993). Nous expliquerons l’organisation générale des expérimentations et comment les savoirs lexicographiques extraits sont modélisés pour former une ontologie. Finalement, nous discuterons des applications possibles de nos travaux en enseignement du lexique, plus particulièrement pour la formation des maîtres. / This research is about lexicology, lexicography and vocabulary teaching/learning. It is part of a project called Ontologization of lexicographic abilites for use in the ﬁelds of applied linguistics, nicknamed Lexitation, which is, to our knowledge, the ﬁrst attempt at extracting lexicographic abilities using experimental techniques. The project relies on the assumption that lexicographic abilities play a role in teaching and acquisition of lexical knowledge, and not only in lexicography per se. We will describe the methods and results of our initial set of experiments, that are based on the use of so-called Think Aloud Protocol (Ericsson et Simon, 1993). We will explain how experiments have been set up and how we are currently proceeding with the extraction and modeling of various types of knowledge and strategies used by lexicographers while performing lexicographic tasks. Finally, we will present possible applications of our work in the ﬁeld of language teaching, more speciﬁcally, teachers’ training. Lexicographie ontologie didactique des langues Lexicography ontology experimental knowledge extraction language teaching
30	Extraction en langue chinoise d'actions spatiotemporalisées réalisées par des personnes ou des organismes / Extraction of spatiotemporally located actions performed by individuals or organizations from Chinese texts Wang, Zhen 09 June 2016 (has links) La thèse a deux objectifs : le premier est de développer un analyseur qui permet d'analyser automatiquement des sources textuelles en chinois simplifié afin de segmenter les textes en mots et de les étiqueter par catégories grammaticales, ainsi que de construire les relations syntaxiques entre les mots. Le deuxième est d'extraire des informations autour des entités et des actions qui nous intéressent à partir des textes analysés. Afin d'atteindre ces deux objectifs, nous avons traité principalement les problématiques suivantes : les ambiguïtés de segmentation, la catégorisation ; le traitement des mots inconnus dans les textes chinois ; l'ambiguïté de l'analyse syntaxique ; la reconnaissance et le typage des entités nommées. Le texte d'entrée est traité phrase par phrase. L'analyseur commence par un traitement typographique au sein des phrases afin d'identifier les écritures latines et les chiffres. Ensuite, nous segmentons la phrase en mots à l'aide de dictionnaires. Grâce aux règles linguistiques, nous créons des hypothèses de noms propres, changeons les poids des catégories ou des mots selon leur contextes gauches ou/et droits. Un modèle de langue n-gramme élaboré à partir d'un corpus d'apprentissage permet de sélectionner le meilleur résultat de segmentation et de catégorisation. Une analyse en dépendance est utilisée pour marquer les relations entre les mots. Nous effectuons une première identification d'entités nommées à la fin de l'analyse syntaxique. Ceci permet d'identifier les entités nommées en unité ou en groupe nominal et également de leur attribuer un type. Ces entités nommées sont ensuite utilisées dans l'extraction. Les règles d'extraction permettent de valider ou de changer les types des entités nommées. L'extraction des connaissances est composée des deux étapes : extraire et annoter automatiquement des contenus à partir des textes analysés ; vérifier les contenus extraits et résoudre la cohérence à travers une ontologie. / We have developed an automatic analyser and an extraction module for Chinese langage processing. The analyser performs automatic Chinese word segmentation based on linguistic rules and dictionaries, part-of-speech tagging based on n-gram statistics and dependency grammar parsing. The module allows to extract information around named entities and activities. In order to achieve these goals, we have tackled the following main issues: segmentation and part-of-speech ambiguity; unknown word identification in Chinese text; attachment ambiguity in parsing. Chinese texts are analysed sentence by sentence. Given a sentence, the analyzer begins with typographic processing to identify sequences of Latin characters and numbers. Then, dictionaries are used for preliminary segmentation into words. Linguistic-based rules are used to create proper noun hypotheses and change the weight of some word categories. These rules take into account word context. An n-gram language model is created from a training corpus and selects the best word segmentation and parts-of-speech. Dependency grammar parsing is used to annotate relations between words. A first step of named entity recognition is performed after parsing. Its goal is to identify single-word named entities and noun-phrase-based named entities and to determine their semantic type. These named entities are then used in knowledge extraction. Knowledge extraction rules are used to validate named entities or to change their types. Knowledge extraction consists of two steps: automatic content extraction and tagging from analysed text; extracted contents control and ontology-based co-reference resolution. Langue chinoise Traitement automatique du chinois Extraction d'information Segmentation Analyse syntaxique Reconnaissance d'entités nommées Chinese Chinese language processing Knowledge extraction Segmentation Parsing Named entity recognition

Search results