• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 254
  • 124
  • 44
  • 38
  • 31
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 636
  • 636
  • 146
  • 133
  • 122
  • 116
  • 95
  • 90
  • 88
  • 83
  • 81
  • 78
  • 73
  • 67
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
461

Modelovanje i pretraživanje nad nestruktuiranim podacima i dokumentima u e-Upravi Republike Srbije / Modeling and searching over unstructured data and documents in e-Government of the Republic of Serbia

Nikolić Vojkan 27 September 2016 (has links)
<p>Danas, servisi e-Uprave u različitim oblastima koriste question answer sisteme koncepta u poku&scaron;aju da se razume tekst i da pomognu građanima u dobijanju odgovora na svoje upite u bilo koje vreme i veoma brzo. Automatsko mapiranje relevantnih dokumenata se ističe kao važna aplikacija za automatsku strategiju klasifikacije: upit-dokumenta. Ova doktorska disertacija ima za cilj doprinos u identifikaciji nestruktuiranih dokumenata i predstavlja važan korak ka razja&scaron;njavanju uloge eksplicitnih koncepata u pronalaženju podataka uop&scaron;te ajče&scaron; a reprezenta vna &scaron;ema u tekstualnoj kategorizaciji je BoW pristup, kada je u pozadini veliki skup znanja. Ova disertacija uvodi novi pristup ka stvaranju koncepta zasnovanog na tekstualnoj prezantaciji i primeni kategorizacije teksta, kako bi se stvorile definisane klase u slučaju sažetih tekstualnih dokumenata Takođe, ovde je prikazan algoritam zasnovan na klasifikaciji, modelovan za upite koji odgovaraju temi. Otežavaju a okolnost u slučaju ovog koncepta, koji prezentuje termine sa visokom frekvencijom pojavljivanja u upitma, zasniva se na sličnostima u prethodno definisanim klasama dokumenata Rezultati eksperimenta iz oblasti Krivičnog zakonika Republike Srbije, u ovom slučaju i studija, pokazuju da prezentacija teksta zasnovana na konceptu ima zadovoljavaju e rezultate i u slučaju kada ne postoji rečnik za datu oblast.</p> / <p>Nowadays, the concept of Question Answering Systems (QAS) has been used by e-government services in various fields as an attempt to understand the text and help citizens in getting answers to their questions promptly and at any time. Automatic mapping of relevant documents stands out as an important application for automatic classification strategy: query-document. This doctoral thesis aims to contribute to identification of unstructured documents and represents an important step towards clarifying the role of explicit concepts within Information Retrieval in general. The most common scheme in text categorization is BoW approach, especially when, as a basis, we have a large set of knowledge. This thesis introduces a new approach to the creation of text presentation based concept and applying text categorization, with the aim to create a defined class in case of compressed text documents.Also, this paper discusses the classification based algorithm modeled for queries that suit the theme. What makes the situation more complicated is the fact that this concept is based on the similarities in previously defined classes of documents and terms with a high frequency of appearance presented in queries. The results of the experiment in the field of the Criminal Code, and this paper as well, show that the text presentation based concept has satisfactory results even in case where there is no vocabulary for certain field.</p>
462

De la mise à l’épreuve de l’alimentation par l’antibiorésistance au développement des concepts sans antibiotique et One Health ˸ publicisation et communication en France et aux États-Unis / From the recognition of the link between antibiotic resistance and food to the development of the antibiotic free production and the One Health approach ˸ publicization and communication in France and in the United States

Badau, Estera-Tabita 20 May 2019 (has links)
Dans une perspective comparative entre la France et les États-Unis, ce travail analyse le processus de publicisation des liens entre l’antibiorésistance et l’alimentation, ainsi que ses implications en termes de contribution au développement de la production appelée sans antibiotique et de l’approche One Health. En partant de la prise de conscience des conséquences de l’usage des antibiotiques dans l’élevage, la recherche s’inscrit dans une réflexion pragmatiste de constitution des problèmes publics et s’appuie sur un corpus hybride composé de documents publiés entre 1980 et 2016 (presse écrite, littérature institutionnelle et entretiens semi-directifs). La méthode développée s’enrichit des outils de textométrie issus de l’analyse de discours et s’intéresse à l’émergence des dénominations et des formules qui nomment le problème, ses causes et ses solutions. La comparaison montre que le processus de publicisation de liens entre l’antibiorésistance et l’alimentation dévoile une trajectoire opposée dans les deux pays. Dans le cas français, ce processus s’inscrit dans un schéma top-down et se caractérise par une publicisation tardive faisant suite aux démarches des instances sanitaires européennes et internationales. L’appropriation du problème par des associations de consommateurs, ainsi que l’investissement des acteurs agroalimentaires dans le développement de la production sans antibiotique, n’émergent que récemment. En revanche, aux États-Unis, ce processus s’inscrit dans un modèle bottom-up suite à la constitution d’un public d’organisations non gouvernementales autour du problème. Leur mobilisation a contribué significativement au développement de programmes d’élevage sans antibiotique ainsi qu’à la mise à l’agenda gouvernemental du problème et le lancement d’un plan national dans une approche One Health. / In a cross-country perspective between France and the United States, this research analyses the process of publicizing the links between antibiotic resistance and food, as well as its contribution to the development of the antibiotic free production and the implementation of the One Health approach. Starting with the awareness of the antibiotic use in livestock consequences, the study relies on the pragmatist approach of the constitution of the public problems. It is based on wide corpora composed by documents published between 1980 and 2016 (written press, institutional literature and semi-directive interviews). The analysis method uses textometric tools derived from discourse analysis and focuses on the emergence of formulas that name the problem, its causes and its solutions. The comparison uncovers an opposite process between the two countries. In France, this process is part of a top-down approach and is characterized by a late publicization following the European and international health authorities’ initiatives. The consumer associations taking over the problem, as well as the agri-food actors’ commitment to the antibiotic free production, is very recent. In the United States, this process reveals a bottom-up model following a non-governmental organizations public constitution taking over the problem. Their mobilization has contributed to the development of the antibiotic free breeding programs, as well as to place the problem on the government agenda that launched a national plan in a One Health approach.
463

文件距離為基礎kNN分群技術與新聞事件偵測追蹤之研究 / A study of relative text-distance-based kNN clustering technique and news events detection and tracking

陳柏均, Chen, Po Chun Unknown Date (has links)
新聞事件可描述為「一個時間區間內、同一主題的相似新聞之集合」,而新聞大多僅是一完整事件的零碎片段,其內容也易受到媒體立場或撰寫角度不同有所差異;除此之外,龐大的新聞量亦使得想要瞭解事件全貌的困難度大增。因此,本研究將利用文字探勘技術群聚相關新聞為事件,以增進新聞所帶來的價值。 分類分群為文字探勘中很常見的步驟,亦是本研究將新聞群聚成事件所運用到的主要方法。最近鄰 (k-nearest neighbor, kNN)搜尋法可視為分類法中最常見的演算法之一,但由於kNN在分類上必須要每篇新聞兩兩比較並排序才得以選出最近鄰,這也產生了kNN在實作上的效能瓶頸。本研究提出了一個「建立距離參考基準點」的方法RTD-based kNN (Relative Text-Distance-based kNN),透過在向量空間中建立一個基準點,讓所有文件利用與基準點的相對距離建立起遠近的關係,使得在選取前k個最近鄰之前,直接以相對關係篩選出較可能的候選文件,進而選出前k個最近鄰,透過相對距離的概念減少比較次數以改善效率。 本研究於Google News中抽取62個事件(共742篇新聞),並依其分群結果作為測試與評估依據,以比較RTD-based kNN與kNN新聞事件分群時的績效。實驗結果呈現出RTD-based kNN的基準點以常用字字彙建立較佳,分群後的再合併則有助於改善結果,而在RTD-based kNN與kNN的F-measure並無顯著差距(α=0.05)的情況下,RTD-based kNN的運算時間低於kNN達28.13%。顯示RTD-based kNN能提供新聞事件分群時一個更好的方法。最後,本研究提供一些未來研究之方向。 / News Events can be described as "the aggregation of many similar news that describe the particular incident within a specific timeframe". Most of news article portraits only a part of a passage, and many of the content are bias because of different media standpoint or different viewpoint of reporters; in addition, the massive news source increases complexity of the incident. Therefore, this research paper employs Text Mining Technique to cluster similar news to a events that can value added a news contributed. Classification and Clustering technique is a frequently used in Text Mining, and K-nearest neighbor(kNN) is one of most common algorithms apply in classification. However, kNN requires massive comparison on each individual article, and it becomes the performance bottlenecks of kNN. This research proposed Relative Text-Distance-based kNN(RTD-based kNN), the core concept of this method is establish a Base, a distance reference point, through a Vector Space, all documents can create the distance relationship through the relative distance between itself and base. Through the concept of relative distance, it can decrease the number of comparison and improve the efficiency. This research chooses a sample of 62 events (with total of 742 news articles) from Google News for the test and evaluation. Under the condition of RTD-based kNN and kNN with a no significant difference in F-measure (α=0.05), RTD-based kNN out perform kNN in time decreased by 28.13%. This confirms RTD-based kNN is a better method in clustering news event. At last, this research provides some of the research aspect for the future.
464

雲端運算環境下基於知識本體之資訊檢索系統建置-以半導體產業為例 / Constructing ontology-based information retrieval system in cloud computing environment – the case of semiconductor industry

李佳穎, Li, Chia Ying Unknown Date (has links)
本研究針對半導體產業,提供一智慧型搜尋功能,讓使用者在大量資料中能快速及準確地搜尋。為達此目的,本研究中定義知識空間及其組成元素,並發展一組程式以產生該知識空間及知識空間搜尋機制,以提升使用者生產力。所使用到的技術包含:(1)建立知識本體,(2)計算兩詞彙同時出現頻率,(3)計算詞彙與文件關聯度,(4)發展知識空間搜尋環境。 / This study aims to provide an intelligent searching environment which users can search quickly and precisely from a large number of documents in semiconductor industry. In order to achieve the purpose, this paper defines a knowledge space and its composition elements to describe the knowledge of real world, and then develops a program to shorten the searching cost by providing the searching mechanism based on knowledge space. The techniques used in this study includes:(1) Construct 「Semiconductor Industry Ontology」(2) Compute the frequency of two terms appearing simultaneously (3) Compute the interrelatedness between terms and documents (4) Develop searching environment based on knowledge space.
465

Extraction automatique de connaissances pour la décision multicritère

Plantié, Michel 29 September 2006 (has links) (PDF)
Cette thèse, sans prendre parti, aborde le sujet délicat qu'est l'automatisation cognitive. Elle propose la mise en place d'une chaîne informatique complète pour supporter chacune des étapes de la décision. Elle traite en particulier de l'automatisation de la phase d'apprentissage en faisant de la connaissance actionnable--la connaissance utile à l'action--une entité informatique manipulable par des algorithmes.<br />Le modèle qui supporte notre système interactif d'aide à la décision de groupe (SIADG) s'appuie largement sur des traitements automatiques de la connaissance. Datamining, multicritère et optimisation sont autant de techniques qui viennent se compléter pour élaborer un artefact de décision qui s'apparente à une interprétation cybernétique du modèle décisionnel de l'économiste Simon. L'incertitude épistémique inhérente à une décision est mesurée par le risque décisionnel qui analyse les facteurs discriminants entre les alternatives. Plusieurs attitudes dans le contrôle du risque décisionnel peuvent être envisagées : le SIADG peut être utilisé pour valider, vérifier ou infirmer un point de vue. Dans tous les cas, le contrôle exercé sur l'incertitude épistémique n'est pas neutre quant à la dynamique du processus de décision. L'instrumentation de la phase d'apprentissage du processus décisionnel conduit ainsi à élaborer l'actionneur d'une boucle de rétroaction visant à asservir la dynamique de décision. Notre modèle apporte un éclairage formel des liens entre incertitude épistémique, risque décisionnel et stabilité de la décision.<br />Les concepts fondamentaux de connaissance actionnable (CA) et d'indexation automatique sur lesquels reposent nos modèles et outils de TALN sont analysés. La notion de connaissance actionnable trouve dans cette vision cybernétique de la décision une interprétation nouvelle : c'est la connaissance manipulée par l'actionneur du SIADG pour contrôler la dynamique décisionnelle. Une synthèse rapide des techniques d'apprentissage les plus éprouvées pour l'extraction automatique de connaissances en TALN est proposée. Toutes ces notions et techniques sont déclinées sur la problématique spécifique d'extraction automatique de CAs dans un processus d'évaluation multicritère. Enfin, l'exemple d'application d'un gérant de vidéoclub cherchant à optimiser ses investissements en fonction des préférences de sa clientèle reprend et illustre le processus informatisé dans sa globalité.
466

Concept Mining: A Conceptual Understanding based Approach

Shehata, Shady January 2009 (has links)
Due to the daily rapid growth of the information, there are considerable needs to extract and discover valuable knowledge from data sources such as the World Wide Web. Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. These techniques consider documents as bags of words and pay no attention to the meanings of the document content. In addition, statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Therefore, there is an intensive need for a model that captures the meaning of linguistic utterances in a formal structure. The underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based model that analyzes terms on the sentence, document and corpus levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, concept extractor and concept-based similarity measure. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based model on different datasets in text clustering, categorization and retrieval are conducted. The experiments demonstrate extensive comparison between traditional weighting and the concept-based weighting obtained by the concept-based model. Experimental results in text clustering, categorization and retrieval demonstrate the substantial enhancement of the quality using: (1) concept-based term frequency (tf), (2) conceptual term frequency (ctf), (3) concept-based statistical analyzer, (4) conceptual ontological graph, (5) concept-based combined model. In text clustering, the evaluation of results is relied on two quality measures, the F-Measure and the Entropy. In text categorization, the evaluation of results is relied on three quality measures, the Micro-averaged F1, the Macro-averaged F1 and the Error rate. In text retrieval, the evaluation of results relies on three quality measures, the precision at 10 documents retrieved P(10), the preference measure (bpref), and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based model is used to enhance the quality of the text clustering, categorization and retrieval.
467

Supplementing consumer insights at Electrolux by mining social media: An exploratory case study

Chaudhary, Amit January 2011 (has links)
Purpose – The aim of this thesis is to explore the possibility of text mining social media, for consumer insights from an organizational perspective. Design/methodology/approach – An exploratory, single case embedded case study with inductive approach and partially mixed, concurrent, dominant status mixed method research design. The case study contains three different studies to try to triangulate the research findings and support research objective of using social media for consumer insights for new products, new ideas and helping research and development process of any organization. Findings – Text mining is a useful, novel, flexible and an unobtrusive method to harness the hidden information in social media. By text-mining social media, an organization can find consumer insights from a large data set and this initiative requires an understanding of social media and its building blocks. In addition, a consumer focused product development approach not only drives social media mining but also enriched by using consumer insights from social media. Research limitations/implications – Text mining is a relatively new subject and focus on developing better analytical tool kits would promote the use of this novel method. The researchers in the field of consumer driven new product development can use social media as additional evidence in their research. Practical implications – The consumer insights gained from the text mining of social media within a workable ethical policy are positive implications for any organization. Unlike conventional marketing research methods text mining is social media is cost and time effective. Originality/value –This thesis attempts to use innovatively text-mining tools, which appear, in the field of computer sciences to mine social media for gaining better understanding of consumers thereby enriching the field of marketing research, a cross-industry effort. The ability of consumers to spread the electronic word of mouth (eWOM) using social media is no secret and organizations should now consider social media as a source to supplement if not replace the insights captured using conventional marketing research methods. Keywords – Social media, Web 2.0, Consumer generated content, Text mining, Mixed methods design, Consumer insights, Marketing research, Case study, Analytic coding, Hermeneutics, Asynchronous, Emergent strategy Paper type Master Thesis
468

Concept Mining: A Conceptual Understanding based Approach

Shehata, Shady January 2009 (has links)
Due to the daily rapid growth of the information, there are considerable needs to extract and discover valuable knowledge from data sources such as the World Wide Web. Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. These techniques consider documents as bags of words and pay no attention to the meanings of the document content. In addition, statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Therefore, there is an intensive need for a model that captures the meaning of linguistic utterances in a formal structure. The underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based model that analyzes terms on the sentence, document and corpus levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, concept extractor and concept-based similarity measure. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based model on different datasets in text clustering, categorization and retrieval are conducted. The experiments demonstrate extensive comparison between traditional weighting and the concept-based weighting obtained by the concept-based model. Experimental results in text clustering, categorization and retrieval demonstrate the substantial enhancement of the quality using: (1) concept-based term frequency (tf), (2) conceptual term frequency (ctf), (3) concept-based statistical analyzer, (4) conceptual ontological graph, (5) concept-based combined model. In text clustering, the evaluation of results is relied on two quality measures, the F-Measure and the Entropy. In text categorization, the evaluation of results is relied on three quality measures, the Micro-averaged F1, the Macro-averaged F1 and the Error rate. In text retrieval, the evaluation of results relies on three quality measures, the precision at 10 documents retrieved P(10), the preference measure (bpref), and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based model is used to enhance the quality of the text clustering, categorization and retrieval.
469

Fuzzy Cluster-Based Query Expansion

Tai, Chia-Hung 29 July 2004 (has links)
Advances in information and network technologies have fostered the creation and availability of a vast amount of online information, typically in the form of text documents. Information retrieval (IR) pertains to determining the relevance between a user query and documents in the target collection, then returning those documents that are likely to satisfy the user¡¦s information needs. One challenging issue in IR is word mismatch, which occurs when concepts can be described by different words in the user queries and/or documents. Query expansion is a promising approach for dealing with word mismatch in IR. In this thesis, we develop a fuzzy cluster-based query expansion technique to solve the word mismatch problem. Using existing expansion techniques (i.e., global analysis and non-fuzzy cluster-based query expansion) as performance benchmarks, our empirical results suggest that the fuzzy cluster-based query expansion technique can provide a more accurate query result than the benchmark techniques can.
470

Discovering Discussion Activity Flows in an On-line Forum Using Data Mining Techniques

Hsieh, Lu-shih 22 July 2008 (has links)
In the Internet era, more and more courses are taught through a course management system (CMS) or learning management system (LMS). In an asynchronous virtual learning environment, an instructor has the need to beware the progress of discussions in forums, and may intervene if ecessary in order to facilitate students¡¦ learning. This research proposes a discussion forum activity flow tracking system, called FAFT (Forum Activity Flow Tracer), to utomatically monitor the discussion activity flow of threaded forum postings in CMS/LMS. As CMS/LMS is getting popular in facilitating learning activities, the proposedFAFT can be used to facilitate instructors to identify students¡¦ interaction types in discussion forums. FAFT adopts modern data/text mining techniques to discover the patterns of forum discussion activity flows, which can be used for instructors to facilitate the online learning activities. FAFT consists of two subsystems: activity classification (AC) and activity flow discovery (AFD). A posting can be perceived as a type of announcement, questioning, clarification, interpretation, conflict, or assertion. AC adopts a cascade model to classify various activitytypes of posts in a discussion thread. The empirical evaluation of the classified types from a repository of postings in earth science on-line courses in a senior high school shows that AC can effectively facilitate the coding rocess, and the cascade model can deal with the imbalanced distribution nature of discussion postings. AFD adopts a hidden Markov model (HMM) to discover the activity flows. A discussion activity flow can be presented as a hidden Markov model (HMM) diagram that an instructor can adopt to predict which iscussion activity flow type of a discussion thread may be followed. The empirical results of the HMM from an online forum in earth science subject in a senior high school show that FAFT can effectively predict the type of a discussion activity flow. Thus, the proposed FAFT can be embedded in a course management system to automatically predict the activity flow type of a discussion thread, and in turn reduce the teachers¡¦ loads on managing online discussion forums.

Page generated in 0.0708 seconds