• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19
  • 8
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 36
  • 36
  • 11
  • 9
  • 9
  • 8
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Text Categorization for E-Government Applications: The Case of City Mayor¡¦s Mailbox

Kuo, Chiung-Jung 29 August 2006 (has links)
The central government and most of local governments in Taiwan have adopted the e-mail services to provide citizens for requesting services or expressing their opinions through Internet. Traditionally, these requests/opinions need to be manually classified into appropriate departments for service rendering. However, due to the ever-increasing number of requests/opinions received, the manual classification approach is time consuming and becomes impractical. Therefore, in this study, we attempt to apply text categorization techniques for constructing automatically a classification mechanism in order to establish an efficient e-government service portal. The purpose of this thesis is to investigate effectiveness of different text categorization methods in supporting automatic classification of service requests/opinions emails sent to Mayor¡¦s mailbox. Specifically, in each phase of text categorization learning, we adopt and evaluate two methods commonly employed in prior research. In the feature selection phase, both the maximal x2¡@statistic method and the weighted average x2¡@statistic method of x2¡@statistic are evaluated. We consider the Binary and TFxIDF representation schemes in the document representation phase. Finally, we adopt the decision tree induction technique and the support vector machines (SVM) technique for inducing a text categorization model for our target e-government application. Our empirical evaluation results show that the text categorization method that employs the maximal x2 statistic method for feature selection, the Binary representation scheme, and the support vector machines as the underlying induction algorithm can reach an accuracy rate of 77.28% and an recall and precision rates of more than 77%. Such satisfactory classification effectiveness suggests that the text categorization approach can be employed to establish an effective and intelligent e-government service portal.
12

Development of Information Extraction-based Event Detection Technique

Lee, Yen-Hsien 30 July 2000 (has links)
Environmental scanning is an important process, which acquires and uses the information about events, trends, and relationships in an organization's external environment. It permits an organization to adapt to its environment and to develop effective responses to secure or improve their position in the future. Event detection technique that identifies the onset of new events from streams of news stories would facilitate the process of organization's environmental scanning. However, traditional feature-based event detection techniques, which identify whether a news story contains an unseen event by comparing the similarity of words between the news story and past news stories, incur some limitations (e.g., the features shown in news document cannot actually represent the event described in it.). Thus, in this study, we developed an information extraction-based event detection (NEED) technique that combines information extraction and text categorization techniques to address the problems inherent to traditional feature-based event detection techniques. The empirical evaluation results showed that the NEED technique outperformed the traditional feature-based event detection techniques in miss rate and false alarm rate and achieved comparable event association accuracy rate to its counterpart.
13

The textcat Package for n-Gram Based Text Categorization in R

Feinerer, Ingo, Buchta, Christian, Geiger, Wilhelm, Rauch, Johannes, Mair, Patrick, Hornik, Kurt 02 1900 (has links) (PDF)
Identifying the language used will typically be the first step in most natural language processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Cavnar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. This paper presents the R extension package textcat for n-gram based text categorization which implements both the Cavnar and Trenkle approach as well as a reduced n-gram approach designed to remove redundancies of the original approach. A multi-lingual corpus obtained from the Wikipedia pages available on a selection of topics is used to illustrate the functionality of the package and the performance of the provided language identification methods. (authors' abstract)
14

Teksto turinio analizė dirbtinių neuronų tinklais / Textual analysis using artificial neural networks

Šatas, Arūnas 11 June 2006 (has links)
The theme of Master project is a posibility to use arificial neural networks for textual analysis and automatic categorization of textual documents in editorial programs. The task of the work was to analyze diferent methods of text clasification using diferent neural networks (SOM, Feed Forward, Learning Vector Quantization, etc.). There are much researchers who works on text clasification and artificial neural networks, but there is no practical fitting of such research. In this work I tried to find posibilities and dificulties of practical use of text clasification. I find that very important thing is initial amount and quality of information and not all neural networks fits for solving text categorization problems.
15

ANALYZING AND CATEGORIZING FLOOD DISASTER-RELATED TWEETS FOR EMERGENCY RESPONSE / 危機対応を目的とした洪水災害関連ツイートの分析と分類

Shi, Yongxue 25 March 2019 (has links)
付記する学位プログラム名: グローバル生存学大学院連携プログラム / 京都大学 / 0048 / 新制・課程博士 / 博士(工学) / 甲第21735号 / 工博第4552号 / 新制||工||1710(附属図書館) / 京都大学大学院工学研究科社会基盤工学専攻 / (主査)教授 堀 智晴, 教授 寶 馨, 准教授 佐山 敬洋, 教授 立川 康人 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DGAM
16

Improving Feature Selection Techniques for Machine Learning

Tan, Feng 27 November 2007 (has links)
As a commonly used technique in data preprocessing for machine learning, feature selection identifies important features and removes irrelevant, redundant or noise features to reduce the dimensionality of feature space. It improves efficiency, accuracy and comprehensibility of the models built by learning algorithms. Feature selection techniques have been widely employed in a variety of applications, such as genomic analysis, information retrieval, and text categorization. Researchers have introduced many feature selection algorithms with different selection criteria. However, it has been discovered that no single criterion is best for all applications. We proposed a hybrid feature selection framework called based on genetic algorithms (GAs) that employs a target learning algorithm to evaluate features, a wrapper method. We call it hybrid genetic feature selection (HGFS) framework. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for the target algorithm. The experiments on genomic data demonstrate that ours is a robust and effective approach that can find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm. A common characteristic of text categorization tasks is multi-label classification with a great number of features, which makes wrapper methods time-consuming and impractical. We proposed a simple filter (non-wrapper) approach called Relation Strength and Frequency Variance (RSFV) measure. The basic idea is that informative features are those that are highly correlated with the class and distribute most differently among all classes. The approach is compared with two well-known feature selection methods in the experiments on two standard text corpora. The experiments show that RSFV generate equal or better performance than the others in many cases.
17

On Travel Article Classification Based on Consumer Information Search Process Model

Hsiao, Yung-Lin 27 July 2011 (has links)
The information overload problem becomes imperative with the explosion of information, and people need some agents to facilitate them to filter the information to meet their personal need. In this work, we conduct a research for the article classification in the tourism domain so as to identify articles that meet users¡¦ information need. We propose an information need orientation model in tourism, which consists of four goals: Initiation, Attraction, Accommodation, and Route planning. These goals can be characterized by 13 features. Some of the identified features can be enhanced by WordNet and Named Entity Recognition techniques as supplement techniques. To test the effectiveness of using the 13 features for classification and the relevant methods, we collected 15,797 articles from TripAdvisor.com, the world's largest travel site, and randomly selected 600 articles as training data labeled by two labelers. The experimental results show that our approach generally has comparable or better performance than that of using purely lexical features, namely TF-IDF, for classification, with fewer features.
18

An Ensemble Approach for Text Categorization with Positive and Unlabeled Examples

Chen, Hsueh-Ching 29 July 2005 (has links)
Text categorization is the process of assigning new documents to predefined document categories on the basis of a classification model(s) induced from a set of pre-categorized training documents. In a typical dichotomous classification scenario, the set of training documents includes both positive and negative examples; that is, each of the two categories is associated with training documents. However, in many real-world text categorization applications, positive and unlabeled documents are readily available, whereas the acquisition of samples of negative documents is extremely expensive or even impossible. In this study, we propose and develop an ensemble approach, referred to as E2, to address the limitations of existing algorithms for learning from positive and unlabeled training documents. Using the spam email filtering as the evaluation application, our empirical evaluation results suggest that the proposed E2 technique exhibits more stable and reliable performance than PNB and PEBL.
19

Text Mining: A Burgeoning Quality Improvement Tool

J. Mohammad, Mohammad Alkin Cihad 01 November 2007 (has links) (PDF)
While the amount of textual data available to us is constantly increasing, managing the texts by human effort is clearly inadequate for the volume and complexity of the information involved. Consequently, requirement for automated extraction of useful knowledge from huge amounts of textual data to assist human analysis is apparent. Text mining (TM) is mostly an automated technique that aims to discover knowledge from textual data. In this thesis, the notion of text mining, its techniques, applications are presented. In particular, the study provides the definition and overview of concepts in text categorization. This would include document representation models, weighting schemes, feature selection methods, feature extraction, performance measure and machine learning techniques. The thesis details the functionality of text mining as a quality improvement tool. It carries out an extensive survey of text mining applications within service sector and manufacturing industry. It presents two broad experimental studies tackling the potential use of text mining for the hotel industry (the comment card analysis), and in automobile manufacturer (miles per gallon analysis). Keywords: Text Mining, Text Categorization, Quality Improvement, Service Sector, Manufacturing Industry.
20

Investigations of Term Expansion on Text Mining Techniques

Yang, Chin-Sheng 02 August 2002 (has links)
Recent advances in computer and network technologies have contributed significantly to global connectivity and stimulated the amount of online textual document to grow extremely rapidly. The rapid accumulation of textual documents on the Web or within an organization requires effective document management techniques, covering from information retrieval, information filtering and text mining. The word mismatch problem represents a challenging issue to be addressed by the document management research. Word mismatch has been extensively investigated in information retrieval (IR) research by the use of term expansion (or specifically query expansion). However, a review of text mining literature suggests that the word mismatch problem has seldom been addressed by text mining techniques. Thus, this thesis aims at investigating the use of term expansion on some text mining techniques, specifically including text categorization, document clustering and event detection. Accordingly, we developed term expansion extensions to these three text mining techniques. The empirical evaluation results showed that term expansion increased the categorization effectiveness when the correlation coefficient feature selection was employed. With respect to document clustering, techniques extended with term expansion achieved comparable clustering effectiveness to existing techniques and showed its superiority in improving clustering specificity measure. Finally, the use of term expansion for supporting event detection has degraded the detection effectiveness as compared to the traditional event detection technique.

Page generated in 0.1233 seconds